Gene Expression Profiles to Predict Relapse of Prostate Cancer

ABSTRACT

The present disclosure provides a method for cancer relapse prediction that provides higher resolution grading than Gleason score alone. In particular, the method provides for prediction of prostate cancer relapse that correlates gene expression of each individual signature gene and deriving a prostate cancer gene expression (GEX) score in the plurality of prostate cancer tissue samples; and correlating said GEX score with the clinical outcome for each prostate carcinoma tissue sample. A set of signature genes is provided that encompasses all or a sub-combination of GI_2094528, KIP2, NRG1, NBL1, Prostein, CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B, and PROK1.

This application is a continuation application of U.S. application Ser.No. 13/035,797, filed Feb. 25, 2011, which is a continuation applicationof U.S. application Ser. No. 11/732,481, filed Apr. 2, 2007, now U.S.Pat. No. 7,914,988, which claims the benefit of priority to, U.S.Provisional Application Ser. No. 60/787,868, filed Mar. 31, 2006, theentire contents of which are incorporated herein by reference in theirentireties.

FIELD

This invention relates generally to gene expression profiling and, morespecifically, to relapse prediction and clinical management of prostatecancer.

BACKGROUND

Prostate cancer is the most common cancer in American men and is thesecond leading cause of cancer death. Progress in treating humanprostate cancer has been hampered by the finding that histologicallyidentical cancers can exhibit widely variant clinical behavior. Forexample, in some men diagnosed with prostate cancer, the diseaseprogresses slowly with a prolonged natural history while in otherpatients, disease progression can be rapid and definitive local therapycan be ineffective. The uncertainty regarding the appropriate clinicalmanagement of prostate cancer in many patients is related to anincomplete and unclear understanding of the molecular and geneticchanges involved in prostate cancer development and disease progression.

A variety of clinical models or nomograms have been developed to aidclinicians with pre-treatment risk assessment. For example, since 1988,the routine use of serum prostate-specific antigen (PSA) testing in menat risk for prostate cancer has led to more favorable diseasecharacteristics at presentation (stage migration) and earlier diagnosisand treatment. Several investigators have used these clinical parametersto stratify patients into risk groups (low, intermediate, high) and topredict clinical outcomes (Nomograms). Despite these useful parameters,approximately 30% of patients with intermediate-risk prostate cancerfail standard treatment as evidenced by a rising serum PSA followingdefinitive therapy. A better understanding of the molecularabnormalities that define these tumors at high risk for relapse isneeded to help identify more precise biosignatures.

For patients newly diagnosed with prostate cancer, there are threewell-defined predictors of disease extent and outcome followingtreatment. These factors are clinical tumor stage (T1-T4) by digitalrectal examination, Gleason score of the diagnostic biopsy specimen andserum PSA level. However, each of these factors alone has not provendefinitive in predicting disease extent and outcome for an individualpatient. Clinical staging by digital rectal examination mayunderestimate the presence of extracapsular disease extension in 30-50%of patients. Although biopsy Gleason score may be helpful in predictingpathologic stage and outcome following treatment at either end of thespectrum (i.e. Gleason 2-4 or Gleason 8-10 tumors), it is not as helpfulfor the majority of patients who present with Gleason 5-7 disease. Asrisk assessment for patients newly diagnosed with prostate cancercontinues to evolve, newer tools, such as genetic or moleculardeterminants are needed to better predict the behavior of an individualtumor.

A need exists for large-scale discovery, validation, and clinicalapplication of mRNA biosignatures of disease and for methods of genomicanalysis in patients with established clinical prostate cancer diseaseto predict disease outcomes. The present invention satisfies this needand provides related advantages as well.

BRIEF SUMMARY

The present invention provides a method for preparing a reference modelfor cancer relapse prediction that provides higher resolution gradingthan Gleason score alone. The method encompasses obtaining fromdifferent individuals a plurality of prostate carcinoma tissue samplesof known clinical outcome representing different Gleason scores;selecting a set of signature genes having an expression pattern thatcorrelates positively or negatively in a statistically significantmanner with the Gleason scores; independently deriving a predictionscore that correlates gene expression of each individual signature genewith Gleason score for each signature gene in said plurality of prostatecarcinoma tissue samples; deriving a prostate cancer gene expression(GEX) score that correlates gene expression of said set of signaturegenes with the Gleason score based on the combination of independentlyderived prediction scores in the plurality of prostate cancer tissuesamples; and correlating said GEX score with the clinical outcome foreach prostate carcinoma tissue sample. A set of signature genes isprovided that encompasses all or a sub-combination of GI_(—2094528),KIP2, NRG1, NBL1, Prostein, CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B, and PROK1.Also provided are methods for predicting the probability of relapse ofcancer in an individual and methods for deriving a prostate cancer geneexpression (GEX) score for a prostate carcinoma tissue sample obtainedfrom an individual.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the correlation between gene expression signature andGleason score. 5-N, 6-N, 7-N and 8-N, correspond respectively to patientgroups with Gleason score of 5, 6, 7 and 8 without relapse. 6-Y, 7-Y,8-Y and 9-Y, correspond respectively to patient groups with Gleasonscore of 6, 7, 8 and 9 with relapse.

FIG. 2 shows a plot of percentage of relapse cases vs. the GEX score.

FIG. 3 shows a Receiver Operating Characteristic (ROC) curve for relapseprediction in prostate cancer. False positive is defined as a case withno relapse, but high score. The connected line shows the performance ofGEX and the dotted line shows the performance of Gleason score.

FIG. 4 shows Kaplan-Maier analysis of relapse (N=71). X-axis: Time torelapse (days); Y-axis: Relapse free probability. (A) Relapse predictionbased on expression signature, GEX>7.337, p=0.0044; (B) Relapseprediction based on Gleason score, Gleason score>7 (p=0.138)

FIG. 5 shows immunohistochemical stains of HOXC6 and Ki67 in tumorcells. Immunohistochemical stains for HOXC6 showed strong nuclear stainsin the prostate cancer cells (A) whereas the normal adjacent glands (B)did not show any staining Ki67 showed variable nuclear stains in thetumor cells (C), but much less stain in the nearby normal prostateglands (D). All magnifications are 400×.

FIG. 6 shows FFPE-derived RNA metrics. Bioanalyzer traces for 10FFPE-derived prostate cancer RNAs are shown. These traces represent themore marginal FFPEs. Ct was measured for a 90 bp amplicon from theRPL13a gene using SYBR Green detection. R² represents the correlationbetween replicate DASL assays for the same cDNA run in parallel. Numbersin red identify samples with poor performance.

FIG. 7 shows sample QC (by qPCR) and array data quality assessment.Highly reproducible gene expression data are obtained with samples thathave up to 8 cycle difference in qPCR (i.e. ˜170-fold difference in“PCR-able” RNA input) for a housekeeping gene RPL13A. Samples arepre-qualified for array analysis by qPCR, using a Ct number of 28 as thecutoff.

FIG. 8 shows a plot of the distribution of survival p-values for variouscollections of signature genes selected among the following 7 genes:MKI67, GI_(—2094528), HOXC6, CCK, memD, FBP1, and CDC6. The same data isplotted in panels A and B but displayed with different scale on the yaxis.

FIG. 9 shows Kaplan-Maier analysis of relapse (N=71) based on the GEXscore derived from the following 7 genes: MKI67, GI_(—)2094528, HOXC6,CCK, memD, FBP1, and CDC6. X-axis: Time to relapse (days); Y-axis:Relapse free probability.

DETAILED DESCRIPTION

This invention is directed to methods for gene expression profiling incancer tissues and the identification of cancer diagnostic andprognostic biomarkers. In particular, the method of the invention allowsfor the establishment of a molecular signature based on identifying acorrelation between the gene expression of each member of a set ofsignature genes in a cell population derived from prostate carcinomatissue sample with the Gleason score corresponding to the tissue sample.It has been discovered that the molecular signature can be expressed asa combined score, termed the Gene Expression Score (“GEX score”), whichallows for relapse prediction more sensitive than relapse predictionbased solely on Gleason score. For each member of the set of signaturegenes that contribute to the GEX score, a highly reproduciblecorrelation of gene expression, either positive or negative, withGleason score can reproducibly be calculated and incorporated into theGEX score.

The present invention provides a method for establishing a model, alsoreferred to as a “reference model,” for prostate relapse predictionagainst which any individual patient sample can be compared to predictrelapse probability more accurately than Gleason score alone. Theinvention also provides a method for deriving a prostate cancer geneexpression score by obtaining a cell population from a tissue sampleobtained from an individual, calculating the GEX score in the tissuesample, and comparing the GEX score to an established model for relapseprediction. The invention also provides a set of signature genes forprostate cancer relapse prediction.

In a particular embodiment, the invention provides a method forpreparing a model for cancer relapse prediction. This method comprisesthe following steps: (a) obtaining a plurality of cell populations fromprostate carcinoma tissues representing different Gleason scores,wherein the clinical outcome corresponding to each of the prostatecarcinoma tissues is known; (b) selecting a set of signature geneshaving an expression pattern that correlates positively or negativelywith the Gleason scores in prostate cancer patients; (c) statisticallyderiving a prediction score for each signature gene in the plurality ofcell populations, wherein the prediction score correlates geneexpression of each individual signature gene with Gleason score; (d)statistically deriving a prostate cancer gene expression score bycalculating the average of the independently derived prediction scoresin the plurality of tissue samples, wherein the prostate cancer geneexpression score correlates gene expression of the set of signaturegenes with the Gleason score; and (e) establishing a model for relapseprediction from the plurality of cell populations that describes theassociation between gene expression score and probability of relapse forprostate cancer.

In a particular embodiment, the invention provides a method forpredicting the probability of relapse of prostate cancer in anindividual by performing the steps of (a) providing expression levelsfor a collection of signature genes from a test individual; (b) derivinga score that captures the expression levels for the collection ofsignature genes; (c) providing a model comprising informationcorrelating the score with prostate cancer relapse; and (d) comparingthe score to the reference model, thereby determining the probability ofprostate cancer relapse for the individual.

Signature genes of the invention, which are differentially expressedwithin prostate carcinomas and which are further positively ornegatively correlated with Gleason score, are as follows: GI_(—)2094528,KIP2, NRG1, NBL1, Prostein, CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B, and PROK1. Ofthe 21 signature genes, twelve are positively correlated with Gleasonscore: GI_(—)2094528, CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1,RAMP, UBE2C, Wnt5A, MEMD; and nine are negatively correlated withGleason score: KIP2, NRG1, NBL1, Prostein, AZGP1, CCK, MLCK, PPAP2B, andPROK1.

The sensitivity and specificity of the molecular signature derived fromthese 21 signature genes, or subset thereof, has utility for patientsundergoing prostate biopsy for diagnosis of carcinoma based onapplicability of the methods described herein to diagnosis as well asprognosis through biopsy samples. Furthermore, the present inventionenables the development of a diagnostic test that is technically simpleand applicable for routine clinical use, and incorporation into existingprostate cancer nomograms (Group TTABPW, Nat Rev Genet 5:229-37 (2004);Ramaswamy, N Engl J Med 350:1814-6 (2004); Sullivan Pepe et al. J NatlCancer Inst 93:1054-61 (2001)).

The articles “a” and “an” are used herein to refer to one or to morethan one (i.e. to at least one) of the grammatical object of thearticle. By way of example, “an element” means one element or more thanone element, unless explicitly indicated to the contrary.

“Prostate cancer” as used herein includes carcinomas, including,carcinoma in situ, invasive carcinoma, metastatic carcinoma andpre-malignant conditions.

As used herein the term “comprising” means that the named elements areessential, but other signature genes or claim elements may be added andstill represent a composition or method within the scope of the claim.The transitional phrase “consisting essentially of” means that theclaimed composition or method encompasses additional elements,including, for example, additional signature genes, that do not affectthe basic and novel characteristics of the claimed invention. Thetransitional phrase “comprising essentially” means that the claimedcomposition or method encompasses additional elements, including, forexample, additional signature genes or claim, that do not substantiallyaffect the basic and novel characteristics of the claimed invention.

As used herein, the term “signature gene” refers to a gene whoseexpression is correlated, either positively or negatively, with diseaseextent or outcome or with another predictor of disease extent oroutcome. A gene expression score (GEX) can be statistically derived fromthe expression levels of a set of signature genes and used to diagnose acondition or to predict clinical course. A “signature nucleic acid” is anucleic acid comprising or corresponding to, in case of cDNA, thecomplete or partial sequence of a RNA transcript encoded by a signaturegene, or the complement of such complete or partial sequence. Asignature protein is encoded by or corresponding to a signature gene ofthe invention.

The term “relapse prediction” is used herein to refer to the predictionof the likelihood of cancer recurrence in patients with no apparentresidual tumor tissue after treatment. The predictive methods of thepresent invention can be used clinically to make treatment decisions bychoosing the most appropriate treatment modalities for any particularpatient. The predictive methods of the present invention also canprovide valuable tools in predicting if a patient is likely to respondfavorably to a treatment regimen, such as surgical intervention,chemotherapy with a given drug or drug combination, and/or radiationtherapy.

In the exemplified embodiments relating to cancer relapse prediction,the gene expression of 21 signature genes, or subsets thereof,differentially expressed in prostate cancer is correlated to Gleasonscores. The Gleason grading system is based on the glandular pattern ofthe tumor. Gleason grade takes into account the ability of the tumor toform glands. A pathologist, using relatively low magnification, performsthe histologic review necessary for assigning the Gleason grade. Therange of grades is 1-5: 1, 2 and 3 are considered to be low to moderatein grade; 4 and 5 are considered to be high grade. The prognosis for agiven patient generally falls somewhere between that predicted by theprimary grade and a secondary grade given to the second most prominentglandular pattern. When the two grades are added the resulting number isreferred to as the Gleason score. The Gleason Score is a more accuratepredictor of outcome than either of the individual grades. Thus, thetraditionally reported Gleason score will be the sum of two numbersbetween 1-5 with a total score from 2-10. It is unusual for the primaryand secondary Gleason grade to differ by more than one, such that theonly way that there can be a Gleason score 7 tumor is if the primary orsecondary Gleason grade is 4. Because of the presence of grade 4glandular patterns in tissue having Gleason score 7, these tumors canbehave in a much more aggressive fashion than those having Gleason score6. In a recent study of over 300 patients, the disease specific survivalfor Gleason score 7 patients was 10 years. In contrast, Gleason score 6patients survived 16 years and Gleason 4-5 for 20 years. It is thereforeclear that the prognosis for men with Gleason score 7 tumors is worsethan for men with Gleason score 5 and 6 tumors. Under certaincircumstances it is suggested that men with Gleason 7 tumors can beconsidered for clinical trials.

As disclosed herein, the gene expression score (GEX) derived from theexpression levels of signature genes can be used to predict relapse ofprostate cancer. A “GEX score” or “score” is a value that captures theexpression levels for a collection of signature genes. Notably, makingcontinuous analogy of Gleason score increased molecular resolution,especially at a Gleason Score between 7 and 8, in which relapseprobability of patients can be stratified beyond the capability of theGleason score alone because the GEX score provides a more sensitivepredictor of relapse for prostate cancer, especially in this range. Asdisclosed in Example I below, a significant difference exists in themean of the GEX score between GS7 patients who had relapse (GS7-Y) andthose without relapse (GS7-N), making the GEX score a valuable source ofinformation for the very patients faced with the most critical decisionpoint along the Gleason score spectrum.

Furthermore, while it is established that Gleason score 7 tumors behavein a more aggressive fashion than do Gleason 5 and 6 tumors, it has beenless clear whether or not it makes a difference if the primary orsecondary pattern accounts for the Gleason score 4 grade. Some studieshave shown Gleason 4+3 to be a worse prognostic sign than is Gleason 3+4(where Gleason x+y represents Gleason score derived from primary grade“x” and secondary grade “y”). However, this has not always been the caseand there are studies that find no difference between the two scores. Todate, there had not been a prospective study that has attempted toanswer this question and it has remained a controversial point. Thepresent invention confirms that among Gleason score 7 patients, aprimary Gleason score of 4 is consistent with a significantly higherrelapse probability. Among the GS7 patients, 21 are (3+4) and 11 are(4+3), of which 1 and 4 are relapsed respectively (Exact Fisher testp=0.037). A mean GEX score of 7.236 and 7.305 for the two groups,respectively (p=0.071 for hypothesis testing increased GEX score for 4+3patients) was obtained.

The ability to stratify individuals having a Gleason Score between 7 and8 based on probability of relapse is significant because a Gleason scoreof 7 is a crucial decision point that necessitates seeking treatment asearly as possible, considering therapies beyond surgery and radiation,and having frequent follow-ups.

Among the signature genes disclosed herein is MEMD, a cell adhesionmolecule often found in metastasizing human melanoma cell lines. Theintact cell adhesion function of MEMD can both favor primary tumorgrowth and represent a rate-limiting step for tissue invasion fromvertical growth phase melanoma. Over expression of this molecule isassociated with tumor invasion and nodal metastasis in esophagealsquamous cell carcinoma. MEMD expression is up-regulated in low-gradeprostate tumors and down-regulated in high-grade tumors and can playrole in progression of prostate cancer. Also among the signature genes,HOXC6 was identified as a good signature for prediction of prostatecancer patient outcome, as was UBE2C, ubiquitin-conjugating enzyme E2C,which is highly expressed in various human primary tumors (e.g.anaplastic thyroid carcinomas) and has an ability to promote cell growthand malignant transformation. WNTSA: wingless-type MMTV integration sitefamily, member 5A. The WNT gene family consists of structurally relatedgenes which encode secreted signaling proteins, which have beenimplicated in oncogenesis and in several developmental processes,including regulation of cell fate and patterning during embryogenesis.Wnt-5a serves as an antagonist to the canonical Wnt-signaling pathwaywith tumor suppressor activity in differentiated thyroid carcinomas andis involved in the response of malignant neuroblasts to retinoic acid.Up-regulation of Wnt-5a is a signature of the malignant phenotype ofhuman squamous cell carcinoma and frequent up-regulation exists of WNTSAmRNA in primary gastric cancer. Increased expression of WNT5a isassociated with cell motility and invasion of metastatic melanoma. Thegene product of a further signature gene, PTTG1: pituitarytumor-transforming 1 has transforming activity in vitro and tumorigenicactivity in vivo, and the gene is highly expressed in various tumors.

The invention also provides a collection of isolated prostate cancersignature genes consisting essentially of CCNE2, CDC6, FBP1, HOXC6,MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2Band PROK1. The invention also provides a collection of isolated prostatecancer signature genes consisting of CCNE2, CDC6, FBP1, HOXC6, MKI67,MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B andPROK1. The invention also provides a collection of isolated prostatecancer signature genes comprising CCNE2, CDC6, FBP1, HOXC6, MKI67,MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B andPROK1.

The invention also provides a collection of isolated prostate cancersignature genes consisting essentially of GI_(—)2094528, KIP2, NRG1,NBL1, Prostein, CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP,UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. The inventionalso provides a collection of isolated prostate cancer signature genesconsisting of GI_(—)2094528, KIP2, NRG1, NBL1, Prostein, CCNE2, CDC6,FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK,MLCK, PPAP2B and PROK1. The invention also provides a collection ofisolated prostate cancer signature genes comprising GI_(—)2094528, KIP2,NRG1, NBL1, Prostein, CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1,RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1.

The invention also provides a collection of isolated prostate cancersignature genes comprising any subset of the 21 genes set forth aboveincluding, for example, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 16, 17, 18, 19, or 20 of the 21 genes. The invention alsoprovides a collection of isolated prostate cancer signature genescomprising CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C,Wnt5A, MEMD, AZGP1, CCK, MLCK and PPAP2B. The invention also provides acollection of isolated prostate cancer signature genes comprising CCNE2,CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1,CCK, and PROK1. The invention also provides a collection of isolatedprostate cancer signature genes comprising CCNE2, CDC6, FBP1, HOXC6,MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, MLCK, and PPAP2B.The invention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, Wnt5A, MEMD, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, Wnt5A, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2, RAMP,UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. The inventionalso provides a collection of isolated prostate cancer signature genescomprising CCNE2, CDC6, FBP1, HOXC6, MKI67, PTTG1, RAMP, UBE2C, Wnt5A,MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. The invention also provides acollection of isolated prostate cancer signature genes comprising CCNE2,CDC6, FBP1, HOXC6, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK,MLCK, PPAP2B and PROK1. The invention also provides a collection ofisolated prostate cancer signature genes comprising CCNE2, CDC6, FBP1,MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2Band PROK1. The invention also provides a collection of isolated prostatecancer signature genes comprising CCNE2, CDC6, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, HOXC6, MKI67, MYBL2, PTTG1,RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, HOXC6, MKI67, MYBL2, PTTG1,RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, CDC6, HOXC6, MKI67, MYBL2, PTTG1,RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CCNE2, FBP1, HOXC6, MKI67, MYBL2, PTTG1,RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. Theinvention also provides a collection of isolated prostate cancersignature genes comprising CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP,UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1. All of thesignature genes can be present in “isolated” form.

One skilled in the art can readily determine other combinations ofsignature genes sufficient to practice the inventions claimed herein.For example, based on the Pearson's correlation coefficient shown inTable 2 or Table 4, one skilled in the art can readily determine asub-combination of prostate cancer signature genes suitable for methodsof the invention. Those exemplary genes having lowest correlation can beexcluded, with the remaining genes providing a sufficient collection ofisolated prostate cancer signature genes suitable for relapse predictionof prostate cancer. For example, of those genes having a negativecorrelation, the CCK gene has the lowest correlation, and thereforeremoving the CCK gene is expected to have the least effect on overallaccuracy of the GEX score. Similarly, of those genes having a positivecorrelation, removing the UBE2C is expected to have the least effect onoverall accuracy of the GEX score. One skilled in the art can readilyrecognize these or other appropriate genes that can be omitted from the21 identified prostate cancer signature genes and still be sufficientfor methods of the invention.

Alternatively, one skilled in the art can remove any one or a few of the21 identified prostate cancer signature genes so long as those remainingprovide a sufficient statistical correlation for use in methods of theinvention. Exemplary collections of prostate cancer signature genesinclude, for example, those set forth above and in the Examples. It isreadily recognized by one skilled in the art that these listedcombinations are merely exemplary and that any of a number of suchcombinations can readily be determined by one skilled in the art. Forexample, a sub-combination can be selected by removing the signaturegene(s) with the weakest correlation, for example, GI_(—)2094528, NRG1,Prostein or UBE2C based on overall correlation, including positive andnegative; NRG1, Prostein or CCK based on the weakest negativecorrelation of the five signature genes having a negative correlationwith GS. HOXC6 and RAMP have the strongest and second strongest,respectively, correlation to GS score of the sixteen signature genes andcan therefore be included in a sub-combination based on this statistic.However, it is understood that, given the set of 21 signature genes,removal of a signature gene having a strong correlation, will likely nothave a big impact on the overall GEX score. The correlations are setforth in Tables 2 and 4 and the skilled person understands that 0indicates the absence of any linear relationship while correlations of−1 to +1 indicate, respectively a perfect negative (inverse) or positive(direct) relationship. Thus, the skilled person can easily determine,based on Tables 2 or 4, which signature genes have comparativelystronger or weaker correlations.

Additionally or alternatively, a ranking based on p-value correlationsor other characteristics can be used to determine a sub-combination ofthe 21 genes for inclusion in a collection of signature genes or for usein a method set forth herein. An exemplary ranking is shown in Table 5.The genes are ranked in Table 5 according to the probability that eachis predictive of prostate cancer relapse when evaluated alone or incombination with one or more other genes in the collection of 21 genes.For example, a GEX score based on the expression level of at least onegene in the collection of 21 genes that is predictive of prostate cancerrelapse is most likely to include the expression level for the MKI67gene. Thus, a GEX score based on the expression level of the MKI67 genealone or in combination with at least one other gene in the collectionof 21 genes has the highest probability of correctly predicting prostatecancer relapse. As a further example, a GEX score based on theexpression level of at least two genes in the collection of 21 genesthat is predictive of prostate cancer relapse is most likely to includethe expression level for the MKI67 and GI_(—)2094528 genes. Thus, a GEXscore based on the expression level of the MKI67 and GI_(—)2094528 genesexclusively or in combination with at least one other gene in thecollection of 21 genes has the highest probability of correctlypredicting prostate cancer relapse. Accordingly, collections ofsignature genes can be built up according to their sequential occurrencein Table 5 (for example, continuing beyond the two collections set forthabove, are a set including MKI67, GI_(—)2094528, and HOXC6; a setincluding MKI67, GI_(—)2094528, HOXC6 and CCK; and so forth).

However, as described in further detail below, a GEX score determinedfrom the expression level of any individual gene in Table 5 or anycombination of 2 to 21 genes in Table 5 can be used to predict relapseof prostate cancer so long as the gene expression pattern for theparticular gene or combination of genes is correlated with Gleasonscore. Thus, although the best combination of gene expression levels toinclude in a 7 gene GEX score for prediction of prostate cancer relapseis derived from the MKI67, GI_(—)2094528, HOXC6, CCK, memD, FBP1, andCDC6 genes, a GEX score including gene expression levels for a varietyof other 7 gene combinations are also predictive.

Thus, the invention provides a method of predicting prostate cancerrelapse based on the expression patterns for any subset of the 21 genesset forth in Table 5 including, for example, at least 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 of the 21 genes.The invention also provides a method of predicting prostate cancerrelapse based on the expression patterns for any subset of the set ofgenes consisting of MKI67, GI_(—)2094528, HOXC6, CCK, memD, FBP1, andCDC6 including, for example, at least 1, 2, 3, 4, 5 or 6 of the 7 genes.Similarly, the invention provides a collection of isolated prostatecancer signature genes comprising any subset of the set of genesconsisting of MKI67, GI_(—)2094528, HOXC6, CCK, memD, FBP1, and CDC6including, for example, at least 1, 2, 3, 4, 5 or 6 of the 7 genes.

The sensitivity and specificity of the molecular signature derived fromthe sets of signature genes provided by the invention provides specificand substantial utility for patients undergoing prostate biopsy fordiagnosis of prostate carcinoma based on applicability of the methodsdescribed herein to diagnosis as well as prognosis through biopsysamples. Furthermore, the present invention enables the development of adiagnostic test that is technically simple and applicable for routineclinical use, and incorporation into existing prostate cancer nomograms(Group TTABPW, Nat Rev Genet 5:229-37 (2004); Ramaswamy, N Engl J Med350:1814-6 (2004); Sullivan Pepe et al. J Natl Cancer Inst 93:1054-61(2001)). Particularly useful, molecular signatures or reference modelsderived from signatures such as expression patterns are those that areat least as predictive of prostate cancer relapse as Gleason score inprostate cancer patients.

As described herein, archived tissue samples, in particular,formalin-fixed paraffin embedded tissue (FFPE) samples, are particularlyuseful for establishing a model for relapse prediction because they aregenerally supported by sufficient clinical follow-up data to allowretrospective studies that correlate gene expression with clinicaloutcome. By “archived tissue sample” herein is meant tissue samples thathave been obtained from a source and preserved. Preferred methods ofpreservation include, but are not limited to paraffin embedding, ethanolfixation and formalin (including formaldehyde and other derivatives)fixation as are known in the art. The sample may be temporally “old”,e.g. months or years old, or recently fixed. For example, post-surgicalprocedures generally include a fixation step on excised tissue forhistological analysis. There are numerous tissue banks and collectionscomprising exhaustive samples from all stages of a wide variety ofdisease states, including cancer. Recently developed methods disclosedin U.S. patent application Ser. No. 10/678,608, which is incorporatedherein by reference in its entirety, have made it feasible to obtainrobust and reproducible gene expression patterns from archived tissues,including formalin-fixed, paraffin-embedded (FFPE) tissues. This abilityto access reliable information from archived tissues enabled thediscovery that is, in part, the present invention.

In a further embodiment, the invention provides a method for deriving aprostate cancer gene expression score for an individual prostatecarcinoma tissue sample by (a) selecting a set of signature genes havingan expression pattern that correlates with Gleason scores in prostatecancer patients; (b) independently deriving a prediction score for eachof a set of signature genes known to have an expression pattern thatcorrelates with Gleason score in prostate cancer patients, wherein theprediction score correlates gene expression of each individual signaturegene with Gleason score; and (c) deriving a prostate cancer geneexpression score by calculating the average of said independentlyderived prediction scores, wherein said prostate cancer gene expressionscore correlates gene expression of said set of signature genes withGleason score.

An exemplary gene expression score and method for determining the scoreare provided in Example I. The gene expression score of Example 1 iscalculated from independently derived prediction scores by averaging. Inparticular embodiments, a gene expression score can be calculated fromthe independently derived prediction scores with different weights,rather than averaging. It will be understood that any score that iscapable of correlating gene expression of individual signature geneswith Gleason score can be used in the invention.

In a related yet distinct embodiment, the invention provides a methodfor predicting the probability of relapse of prostate cancer in anindividual by performing the following steps: (a) selecting a set ofsignature genes having an expression pattern that correlates withGleason scores in prostate cancer patients; (b) independently deriving aprediction score for each signature gene, wherein the prediction scorecorrelates gene expression of each individual signature gene withGleason score; (c) deriving a prostate cancer gene expression score bycalculating the average of the independently derived prediction scores,wherein the prostate cancer gene expression score correlates geneexpression of the set of signature genes with Gleason score (d)comparing the cancer gene expression score to a model to determine theprobability of relapse.

The invention further provides a method for predicting the probabilityof cancer relapse in an individual by performing the following steps:(a) providing a gene expression score derived from independentlydetermined expression levels for a plurality of signature genes insuspected prostate cancer tissue obtained from the individual; (b)providing a model derived from the gene expression scores for aplurality of other individuals and the relapse rates for the pluralityof other individuals and (c) comparing the gene expression score to themodel to determine the probability of relapse.

In a particular embodiment, the invention provides a method forpredicting the probability of relapse of prostate cancer in anindividual by performing the steps of (a) providing expression levelsfor a collection of signature genes from a test individual; (b) derivinga score that captures the expression levels for the collection ofsignature genes; (c) providing a model comprising informationcorrelating the score with prostate cancer relapse; and (d) comparingthe score to the reference model, thereby determining the probability ofprostate cancer relapse for the individual.

In a further embodiment, the invention provides a method for predictingthe probability of relapse of prostate cancer in an individual byperforming the steps of (a) providing expression levels for a collectionof signature genes from a test individual, wherein the collection ofsignature genes comprises at least two genes selected from the groupconsisting of GI_(—)2094528, KIP2, NRG1, NBL1, Prostein, CCNE2, CDC6,FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK,MLCK, PPAP2B, and PROK1; (b) deriving a score that captures theexpression levels for the collection of signature genes; (c) providing amodel comprising information correlating the score with prostate cancerrelapse; and (d) comparing the score to the reference model, therebydetermining the probability of prostate cancer relapse for theindividual.

While the present invention is disclosed and exemplified with the 21signature genes set forth above and further in the context ofcorrelation to Gleason score in prostate cancer, the methods areuniversally applicable to the diagnosis and prognosis of a broad rangeof cancers and other conditions. The skilled person apprised of theinvention disclosed herein will appreciate that any known predictor ofdisease extent for any condition can be selected to establish a GEXscore for prognosis of relapse that can be more accurate or sensitivethan relapse prediction solely based on the known predictor alone.

Accordingly, the invention provides a method for preparing a model forprognosis of an individual having a disease or condition. The method caninclude the steps of (a) obtaining from different individuals aplurality of tissue samples suspected of having a disease or condition,wherein the clinical outcome corresponding to each of the tissue samplesis known, wherein a prognostic score of the tissue samples is knownwherein the prognostic score includes stratified grades corresponding tothe clinical outcome; (b) selecting a set of signature genes having anexpression pattern that correlates positively or negatively in astatistically significant manner with the prognostic score; (c)independently deriving a prediction score for each signature gene in theplurality of tissue samples, wherein the prediction score correlatesgene expression of each individual signature gene with the prognosticscore; (d) deriving a gene expression (GEX) score based on a combinationof the independently derived prediction scores in the plurality oftissue samples, wherein the GEX score correlates gene expression of theset of signature genes with the prognostic score; and (f) correlatingthe GEX score with the clinical outcome for each of the tissue samples,thereby establishing a model for prognosis of an individual having thedisease or condition, wherein the model provides a higher resolutioncorrelation with the clinical outcome than the strata of the prognosticscore.

Individuals suspected of having any of a variety of diseases orconditions, such as cancer, can be evaluated using a method of theinvention. Exemplary cancers that can be evaluated using a method of theinvention include, but are not limited to hematoporetic neoplasms, AdultT-cell leukemia/lymphoma, Lymphoid Neoplasms, Anaplastic large celllymphoma, Myeloid Neoplasms, Histiocytoses, Hodgkin Diseases (HD),Precursor B lymphoblastic leukemia/lymphoma (ALL), Acute myclogenousleukemia (AML), Precursor T lymphoblastic leukemia/lymphoma (ALL),Myclodysplastic syndromes, Chronic Mycloproliferative disorders, Chroniclymphocytic leukemia/small lymphocytic lymphoma (SLL), ChronicMyclogenous Leukemia (CML), Lymphoplasmacytic lymphoma, PolycythemiaVera, Mantle cell lymphoma, Essential Thrombocytosis, Follicularlymphoma, Myelofibrosis with Myeloid Metaplasia, Marginal zone lymphoma,Hairy cell leukemia, Hemangioma, Plasmacytoma/plasma cell myeloma,Lymphangioma, Glomangioma, Diffuse large B-cell lymphoma, KaposiSarcoma, Hemanioendothelioma, Burkitt lymphoma, Angiosarcoma, T-cellchronic lymphocytic leukemia, Hemangiopericytoma, Large granularlymphocytic leukemia, head & neck cancers, Basal Cell Carcinoma, Mycosisfungoids and sezary syndrome, Squamous Cell Carcinoma, Ceruminoma,Peripheral T-cell lymphoma, Osteoma, Nonchromaffin Paraganglioma,Angioimmunoblastic T-cell lymphoma, Acoustic Neurinoma, Adenoid CysticCarcinoma, Angiocentric lymphoma, Mucoepidermoid Carcinoma, NK/T-celllymphoma, Malignant Mixed Tumors, Intestinal T-cell lymphoma,Adenocarcinoma, Malignant Mesothelioma, Fibrosarcoma, Sarcomotoid Typelung cacer, Osteosarcoma, Epithelial Type lung cancer, Chondrosarcoma,Melanoma, cancer of the gastrointestinal tract, olfactory Neuroblastoma,Squamous Cell Carcinoma, Isolated Plasmocytoma, Adenocarcinoma, InvertedPapillomas, Carcinoid, Undifferentiated Carcinoma, Malignant Melanoma,Mucoepidermoid Carcinoma, Adenocarcinoma, Acinic Cell Carcinoma, GastricCarcinoma, Malignant Mixed Tumor, Gastric Lymphoma, Gastric Stromal CellTumors, Amenoblastoma, Lymphoma, Odontoma, Intestinal Stromal Celltumors, thymus cancers, Malignant Thymoma, Carcinids, Type I (Invasivethymoma), Malignant Mesethelioma, Type II (Thymic carcinoma), Non-mucinproducing adenocarcinoma, Squamous cell carcinoma, Lymph epithelioma,cancers of the liver and biliary tract, Squamous Cell Carcinoma,Hepatocellular Carcinoma, Adenocarcinoma, Cholangiocarcinoma,Hepatoblastoma, papillary cancer, Angiosarcoma, solid Bronchioalveolarcancer, Fibrolameller Carcinoma, Small Cell Carcinoma, Carcinoma of theGallbladder, Intermediate Cell carcinaoma, Large Cell Carcinoma,Squamous Cell Carcinoma, Undifferentiated cancer, cancer of thepancreas, cancer of the female genital tract, Squamous Cell Carcinoma,Cystadenocarcinoma, Basal Cell Carcinoma, Insulinoma, Melanoma,Gastrinoma, Fibrosarcoma, Glucagonamoa, Intaepithelial Carcinoma,Adenocarcinoma Embryonal, cancer of the kidney, Rhabdomysarcoma, RenalCell Carcinoma, Large Cell Carcinoma, Nephroblastoma (Wilm's tumor),Neuroendocrine or Oat Cell carcinoma, cancer of the lower urinary tract,Adenosquamous Carcinoma, Urothelial Tumors, Undifferentiated Carcinoma,Squamous Cell Carcinoma, Carcinoma of the female genital tract, MixedCarcinoma, Adenoacanthoma, Sarcoma, Small Cell Carcinoma,Carcinosarcoma, Leiomyosarcoma, Endometrial Stromal Sarcoma, cancer ofthe male genital tract, Serous Cystadenocarcinoma, MucinousCystadenocarcinoma, Sarcinoma, Endometrioid Tumors, SperetocyticSarcinoma, Embyonal Carcinoma, Celioblastoma, Choriocarcinoma, Teratoma,Clear Cell Carcinoma, Leydig Cell Tumor, Unclassified Carcinoma, SertoliCell Tumor, Granulosa-Theca Cell Tumor, Sertoli-Leydig Cell Tumor,Disgerminoma, Undifferentiated Prostatic Carcinoma, Teratoma, DuctalTransitional carcinoma, breast cancer, Phyllodes Tumor, cancer of thebones joints and soft tissue, Paget's Disease, Multiple Myeloma, InsituCarcinoma, Malignant Lymphoma, Invasive Carcinoma, Chondrosacrcoma,Mesenchymal Chondrosarcoma, cancer of the endocrine system,Osteosarcoma, Adenoma, Ewing Tumor, endocrine Carcinoma, Malignant GiantCell Tumor, Meningnoma, Adamantinoma, Cramiopharlingioma, MalignantFibrous Histiocytoma, Papillary Carcinoma, Histiocytoma, FollicularCarcinoma, Desmoplastic Fibroma, Medullary Carcinoma, Fibrosarcoma,Anoplastic Carcinoma, Chordoma, Adenoma, Hemangioendothelioma,Memangispericytoma, Pheochromocytoma, Liposarcoma, Neuroblastoma,Paraganglioma, Histiocytoma, Pineal cancer, Rhabdomysarcoms,Pineoblastoma, Leiomyosarcoma, Pineocytoma, Angiosarcoma, skin cancer,cancer of the nervous system, Melanoma, Schwannoma, Squamous cellcarcinoma, Neurofibroma, Basal cell carcinoma, Malignant Periferal NerveSheath Tumor, Merkel cell carcinoma, Sheath Tumor, Extramamary Paget'sDisease, Astrocytoma, Paget's Disease of the nipple, FibrillaryAstrocytoma, Glioblastoma Multiforme, Brain Stem Glioma, CutaneousT-cell lymphoma, Pilocytic Astrocytoma, Xanthorstrocytoma,Histiocytosis, Oligodendroglioma, Ependymoma, Gangliocytoma, CerebralNeuroblastoma, Central Neurocytoma, Dysembryoplastic NeuroepithelialTumor Medulloblastoma, Malignant Meningioma, Primary Brain Lymphoma,Primary Brain Germ Cell Tumor, cancers of the eye, Squamous CellCarcinoma, Mucoepidermoid Carcinoma, Melanoma, Retinoblastoma, Glioma,Meningioma, cancer of the heart, Myxoma, Fibroma, Lipoma, PapillaryFibroelastoma, Rhasdoyoma, or Angiosarcoma among others.

A prognostic score that can be used in the invention includes forexample the Gleason score having stratified grades 1-9 that correspondto clinical outcome as set forth elsewhere herein. A further example ofa prognostic score that is useful in the invention is cancer staginghaving stratified grades of stages 1, 2, 3 and 4. The invention can beused to provide a higher resolution correlation with the clinicaloutcome than the strata of the Gleason or cancer staging prognosticscores. Diseases or conditions other than cancer for which stratifiedgrades have been correlated with clinical outcome can also be used in amethod of the invention to determine a prognostic model or to determinea prognosis for an individual suspected of having the disease orcondition. Exemplary clinical outcomes that can be determined from amodel of the invention include, for example, relapse probability,survival rate, or time to relapse. Another clinical outcome that can bedetermined from a model of the invention is response to a particularcourse of therapy such as surgical removal of a tumor, radiation, orchemotherapy.

Any signature gene or combination of signature genes listed within Table2 or Table 4 can be used in the methods, compositions, and kits of thepresent invention. Similarly, any additional gene known or discovered tobe differentially expressed and correlated with Gleason score can beused in combination with some or all of the signature genes set forthwithin Table 2 or Table 4 and used in the methods, compositions, andkits of the present invention. In general, it is preferable to usesignature genes for which the difference between the level of expressionof the signature gene in prostate cancer cells or prostate-associatedbody fluids and the level of expression of the same signature gene innormal prostate cells or prostate-associated body fluids is as great aspossible. Although the difference can be as small as the limit ofdetection of the method for assessing expression of the signature gene,it is preferred that the difference be at least greater than thestandard error of the assessment method, and preferably a difference ofat least 1.1-, 1.2-, 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2-, 3-,4-, 5-, 6-, 7-, 8-, 9-, 10-, 15-, 20-, 25-, 100-, 500-, 1000-fold orgreater. It also is preferable to use signature genes for which thecorrelation with Gleason score is more rather than less significant.

The skilled person will appreciate that patient tissue samplescontaining prostate cells or prostate cancer cells may be used in themethods of the present invention including, but not limited to thoseaimed at predicting relapse probability. In these embodiments, the levelof expression of the signature gene can be assessed by assessing theamount, e.g. absolute amount or concentration, of a signature geneproduct, e.g., protein and RNA transcript encoded by the signature geneand fragments of the protein and RNA transcript) in a sample, e.g.,stool and/or blood obtained from a patient. The sample can, of course,be subjected to a variety of well-known post-collection preparative andstorage techniques (e.g. fixation, storage, freezing, lysis,homogenization, DNA or RNA extraction, ultrafiltration, concentration,evaporation, centrifugation, etc.) prior to assessing the amount of thesignature gene product in the sample.

In the methods of the invention aimed at preparing a model for prostatecancer relapse prediction, it is understood that the particular clinicaloutcome associated with each sample contributing to the model must beknown. Consequently, the model can be established using archivedtissues. In the methods of the invention aimed at preparing a model forprostate cancer relapse prediction, total RNA is generally extractedfrom the source material of interest, generally an archived tissue suchas a formalin-fixed, paraffin-embedded tissue, and subsequentlypurified. Methods for obtaining robust and reproducible gene expressionpatterns from archived tissues, including formalin-fixed,paraffin-embedded (FFPE) tissues are taught in United States PatentPublication 2004/0259105, which is incorporated herein by reference inits entirety. Commercial kits and protocols for RNA extraction from FFPEtissues are available including, for example, ROCHE High Pure RNAParaffin Kit (Roche) MasterPure™ Complete DNA and RNA Purification Kit(EPICENTRE®Madison, Wis.); Paraffin Block RNA Isolation Kit (Ambion,Inc.) and RNeasy™ Mini kit (Qiagen, Chatsworth, Calif.).

The use of FFPE tissues as a source of RNA for RT-PCR has been describedpreviously (Stanta et al., Biotechniques 11:304-308 (1991); Stanta etal., Methods Mol. Biol. 86:23-26 (1998); Jackson et al., Lancet 1:1391(1989); Jackson et al., J. Clin. Pathol. 43:499-504 (1999); Finke etal., Biotechniques 14:448-453 (1993); Goldsworthy et al., Mol. Carcinog.25:86-91 (1999); Stanta and Bonin, Biotechniques 24:271-276 (1998);Godfrey et al., J. Mol. Diagnostics 2:84 (2000); Specht et al., J. Mol.Med. 78:B27 (2000); Specht et al., Am. J. Pathol. 158:419-429 (2001)).For quick analysis of the RNA quality, RT-PCR can be performed utilizinga pair of primers targeting a short fragment in a highly expressed gene,for example, actin, ubiquitin, gapdh or other well-described commonlyused housekeeping gene. If the cDNA synthesized from the RNA sample canbe amplified using this pair of primers, then the sample is suitable forthe a quantitative measurements of RNA target sequences by any methodpreferred, for example, the DASL assay, which requires only a short cDNAfragment for the annealing of query oligonucleotides.

There are numerous tissue banks and collections including exhaustivesamples from all stages of a wide variety of disease states, mostnotably cancer. The ability to perform genotyping and/or gene expressionanalysis, including both qualitative and quantitative analysis on thesesamples enables the application of this methodology to the methods ofthe invention. In particular, the ability to establish a correlation ofgene expression and a known predictor of disease extent and/or outcomeby probing the genetic state of tissue samples for which clinicaloutcome is already known, allows for the establishment of a correlationbetween a particular molecular signature and the known predictor, suchas a Gleason score, to derive a GEX score that allows for a moresensitive prognosis than that based on the known predictor alone. Theskilled person will appreciate that by building databases of molecularsignatures from tissue samples of known outcomes, many such correlationscan be established, thus allowing both diagnosis and prognosis of anycondition.

Tissue samples useful for preparing a model for prostate cancer relapseprediction include, for example, paraffin and polymer embedded samples,ethanol embedded samples and/or formalin and formaldehyde embeddedtissues, although any suitable sample may be used. In general, nucleicacids isolated from archived samples can be highly degraded and thequality of nucleic preparation can depend on several factors, includingthe sample shelf life, fixation technique and isolation method. However,using the methodologies taught in United States Patent Publication2004/0259105, which have the significant advantage that short ordegraded targets can be used for analysis as long as the sequence islong enough to hybridize with the oligonucleotide probes, highlyreproducible results can be obtained that closely mimic results found infresh samples.

Archived tissue samples, which can be used for all methods of theinvention, typically have been obtained from a source and preserved.Preferred methods of preservation include, but are not limited toparaffin embedding, ethanol fixation and formalin, includingformaldehyde and other derivatives, fixation as are known in the art. Atissue sample may be temporally “old”, e.g. months or years old, orrecently fixed. For example, post-surgical procedures generally includea fixation step on excised tissue for histological analysis. In apreferred embodiment, the tissue sample is a diseased tissue sample,particularly a prostate cancer tissue, including primary and secondarytumor tissues as well as lymph node tissue and metastatic tissue.

Thus, an archived sample can be heterogeneous and encompass more thanone cell or tissue type, for example, tumor and non-tumor tissue.Preferred tissue samples include solid tumor samples including, but notlimited to, tumors of the prostate. It is understood that inapplications of the present invention to conditions other than prostatecancer the tumor source can be brain, bone, heart, breast, ovaries,prostate, uterus, spleen, pancreas, liver, kidneys, bladder, stomach andmuscle. Similarly, depending on the condition, suitable tissue samplesinclude, but are not limited to, bodily fluids (including, but notlimited to, blood, urine, serum, lymph, saliva, anal and vaginalsecretions, perspiration and semen, of virtually any organism, withmammalian samples being preferred and human samples being particularlypreferred). In embodiments directed to methods of establishing a modelfor relapse prediction, the tissue sample is one for which patienthistory and outcome is known. Generally, the invention methods can bepracticed with the signature gene sequence contained in an archivedsample or can be practiced with signature gene sequences that have beenphysically separated from the sample prior to performing a method of theinvention.

If required, a nucleic acid sample having the signature gene sequence(s)are prepared using known techniques. For example, the sample can betreated to lyse the cells, using known lysis buffers, sonication,electroporation, etc., with purification and amplification as outlinedbelow occurring as needed, as will be appreciated by those in the art.In addition, the reactions can be accomplished in a variety of ways, aswill be appreciated by those in the art. Components of the reaction maybe added simultaneously, or sequentially, in any order, with preferredembodiments outlined below. In addition, the reaction can include avariety of other reagents which can be useful in the assays. Theseinclude reagents like salts, buffers, neutral proteins, e.g. albumin,detergents, etc., which may be used to facilitate optimal hybridizationand detection, and/or reduce non-specific or background interactions.Also reagents that otherwise improve the efficiency of the assay, suchas protease inhibitors, nuclease inhibitors, anti-microbial agents,etc., can be used, depending on the sample preparation methods andpurity.

In a preferred embodiment mRNA is isolated from paraffin embeddedsamples as is known in the art. Preferred methods include the use of theParaffin Block RNA Isolation Kit by Ambion (Catalog number 1902, whichinstruction manual is incorporated herein by reference) or the high pureRNA parafin kit by Roche (cat #3270289). Samples of mRNA can be obtainedfrom other samples using methods known in the art including for example,those described in Sambrook et al., Molecular Cloning: A LaboratoryManual, 3rd edition, Cold Spring Harbor Laboratory, New York (2001) orin Ausubel et al., Current Protocols in Molecular Biology, John Wileyand Sons, Baltimore, Md. (1998), or those that are commerciallyavailable such as the Invitrogen PureLink miRNA isolation kit (cat#K1570) or mRNA isolation kits from Ambion (Austin, Tex.). Once prepared,mRNA or other nucleic acids are analyzed by methods known to those ofskill in the art. The nucleic acid sequence corresponding to a signaturegene can be any length, with the understanding that longer sequences aremore specific. Recently developed methods for obtaining robust andreproducible gene expression patterns from archived tissues, includingformalin-fixed, paraffin-embedded (FFPE) tissues as taught in UnitedStates Patent Application Publication No. 2004/0259105 have thesignificant advantage that short or degraded targets can be used foranalysis as long as the sequence is long enough to hybridize with theoligonucleotide probes. Thus, even degraded target nucleic acids can beanalyzed. Preferably a nucleic acid corresponding to a signature gene isat least 20 nucleotides in length. Preferred ranges are from 20 to 100nucleotides in length, with from 30 to 60 nucleotides being morepreferred and from 40 to 50 being most preferred.

In addition, when nucleic acids are to be detected preferred methodsutilize cutting or shearing techniques to cut the nucleic acid samplecontaining the target sequence into a size that will facilitate handlingand hybridization to the target. This can be accomplished by shearingthe nucleic acid through mechanical forces (e.g. sonication) or bycleaving the nucleic acid using restriction endonucleases, or any othermethods known in the art. However, in most cases, the naturaldegradation that occurs during archiving results in “short”oligonucleotides. In general, the methods of the invention can be doneon oligonucleotides as short as 20-100 basepairs, with from 20 to 50being preferred, and between 40 and 50, including 44, 45, 46, 47, 48 and49 being the most preferred.

Tissue samples useful in a method of the invention for deriving aprostate cancer gene expression score or in a method of the inventionfor predicting the probability of relapse of prostate cancer in anindividual, include the tissue samples described above as useful forpreparing a model for prostate cancer relapse prediction, but alsoinclude fresh and non-archived samples. Unlike for tissue samples usefulfor preparing a model for prostate cancer relapse prediction, thetissues used for deriving a prostate cancer GEX score or in a method ofthe invention for predicting the probability of relapse of prostatecancer in an individual, for obvious reasons, do not have therequirement that clinical outcome be known. Consequently, freshlyobtained tissue samples can also be used in the individual methods forGEX score determination and relapse prediction methods, such as in aprospective study or clinical trial.

The methods of the invention depend on the detection of differentiallyexpressed genes for expression profiling across heterogeneous tissues.Thus, the methods depend on profiling genes whose expression in certaintissues is activated to a higher or lower level in an individualafflicted with a condition, for example, cancer, such as prostatecancer, relative to its expression in non-cancerous tissues or in acontrol subject. Gene expression can be activated to a higher or lowerlevel at different stages of the same conditions and a differentiallyexpressed gene can be either activated or inhibited at the nucleic acidlevel or protein level, or may be subject to alternative splicing toresult in a different polypeptide product. Such differences can beevidenced by a change in mRNA levels, surface expression, secretion orother partitioning of a polypeptide, for example. For the purpose ofthis invention, differential gene expression is considered to be presentwhen there is at least about 1.1-fold, 1.2-fold, 1.3-fold, 1.4-fold,1.5-fold, 1.6-fold, 1.7-fold, 1.8-fold, 1.9-fold, to two-fold,preferably at least about four-fold, more preferably at least aboutsix-fold, most preferably at least about ten-fold difference between theexpression of a given gene in normal and diseased tissues.

Differential signature gene expression can be identified, or confirmedusing methods known in the art such as quantitative RT-PCR. Inparticular embodiments, differential signature gene expression can beidentified, or confirmed using microarray techniques. Thus, thesignature genes can be measured in either fresh or paraffin-embeddedtumor tissue, using microarray technology. In this method,polynucleotide sequences of interest are plated, or arrayed, on amicrochip substrate. The arrayed sequences are then hybridized withspecific DNA probes from cells or tissues of interest. In a preferredembodiment the technology combines fiber optic bundles and beads thatself-assemble into an array. Each fiber optic bundle contains thousandsto millions of individual fibers depending on the diameter of thebundle. Sensors are affixed to each bead in a given batch. Theparticular molecules on a bead define that bead's function as a sensor.To form an array, fiber optic bundles are dipped into pools of coatedbeads. The coated beads are drawn into the wells, one bead per well, onthe end of each fiber in the bundle. The present invention is notlimited to the solid supports described above. Indeed, a variety ofother solid supports are contemplated including, but not limited to,glass microscope slides, glass wafers, gold, silicon, microchips, andother plastic, metal, ceramic, or biological surfaces. Microarrayanalysis can be performed by commercially available equipment, followingmanufacturer's protocols, such as by using Illumina's technology.

Exemplary arrays that are useful include, without limitation, a Sentrix®Array or Sentrix® BeadChip Array available from Illumina®, Inc. (SanDiego, Calif.) or others including beads in wells such as thosedescribed in U.S. Pat. Nos. 6,266,459, 6,355,431, 6,770,441, and6,859,570; and PCT Publication No. WO 00/63437, each of which is herebyincorporated by reference. Other arrays having particles on a surfaceinclude those set forth in US 2005/0227252; US 2006/0023310; US2006/006327; US 2006/0071075; US 2006/0119913; U.S. Pat. No. 6,489,606;U.S. Pat. No. 7,106,513; U.S. Pat. No. 7,126,755; U.S. Pat. No.7,164,533; WO 05/033681; and WO 04/024328, each of which is herebyincorporated by reference.

An array of beads useful in the invention can also be in a fluid formatsuch as a fluid stream of a flow cytometer or similar device. Exemplaryformats that can be used in the invention to distinguish beads in afluid sample using microfluidic devices are described, for example, inU.S. Pat. No. 6,524,793. Commercially available fluid formats fordistinguishing beads include, for example, those used in XMAP™technologies from Luminex or MPSS™ methods from Lynx Therapeutics.

Further examples of commercially available microarrays that can be usedin the invention include, for example, an Affymetrix® GeneChip®microarray or other microarray synthesized in accordance with techniquessometimes referred to as VLSIPS™ (Very Large Scale Immobilized PolymerSynthesis) technologies as described, for example, in U.S. Pat. Nos.5,324,633; 5,744,305; 5,451,683; 5,482,867; 5,491,074; 5,624,711;5,795,716; 5,831,070; 5,856,101; 5,858,659; 5,874,219; 5,968,740;5,974,164; 5,981,185; 5,981,956; 6,025,601; 6,033,860; 6,090,555;6,136,269; 6,022,963; 6,083,697; 6,291,183; 6,309,831; 6,416,949;6,428,752 and 6,482,591, each of which is hereby incorporated byreference.

A spotted microarray can also be used in a method of the invention. Anexemplary spotted microarray is a CodeLink™ Array available fromAmersham Biosciences. Another microarray that is useful in the inventionis one that is manufactured using inkjet printing methods such asSurePrint™ Technology available from Agilent Technologies. Othermicroarrays that can be used in the invention include, withoutlimitation, those described in Butte, Nature Reviews Drug Discov.1:951-60 (2002) or U.S. Pat. Nos. 5,429,807; 5,436,327; 5,561,071;5,583,211; 5,658,734; 5,837,858; 5,919,523; 6,287,768; 6,287,776;6,288,220; 6,297,006; 6,291,193; and 6,514,751; and WO 93/17126; WO95/35505, each of which is hereby incorporated by reference.

DASL can be used for quantitative measurements of RNA target sequencesas well as for DNA target sequences. DASL is described, for example, inFan et al., Genome Res. 14:878-85 (2004); US 2003/0108900 and US2004/0259105, each of which is incorporated herein by reference.Notably, the sensitivity of DASL using RNA from paraffin samples isabout 80% compared to the assay using RNA prepared from fresh frozensamples, with results up to 90% sensitivity observed. Gene expressioncan be monitored and compared in formalin-fixed, paraffin-embeddedclinical samples archived for more than 5 years.

The expression patterns for signature genes are determined based onquantitative detection of nucleic acids or oligonucleotidescorresponding to the signature genes, which means at least twonucleotides covalently linked together. Thus, the invention alsoprovides a collection of nucleic acids and oligonucleotides thatcorrespond to a signature gene or a set of signature genes. A nucleicacid useful in the methods of the invention will generally containphosphodiester bonds, although in some cases, nucleic acid analogs areincluded that may have alternate backbones, including, for example,phosphoramide (Beaucage et al., Tetrahedron 49(10):1925 (1993) andreferences therein; Letsinger, J. Org. Chem. 35:3800 (1970); Sprinzl etal., Eur. J. Biochem. 81:579 (1977); Letsinger et al., Nucl. Acids Res.14:3487 (1986); Sawai et al, Chem. Lett. 805 (1984), Letsinger et al.,J. Am. Chem. Soc. 110:4470 (1988); and Pauwels et al., Chemica Scripta26:141 91986)), phosphorothioate (Mag et al., Nucleic Acids Res. 19:1437(1991); and U.S. Pat. No. 5,644,048), phosphorodithioate (Briu et al.,J. Am. Chem. Soc. 111:2321 (1989), O-methylphophoroamidite linkages (seeEckstein, Oligonucleotides and Analogues: A Practical Approach, OxfordUniversity Press), and peptide nucleic acid backbones and linkages (seeEgholm, J. Am. Chem. Soc. 114:1895 (1992); Meier et al., Chem. Int. Ed.Engl. 31:1008 (1992); Nielsen, Nature, 365:566 (1993); Carlsson et al.,Nature 380:207 (1996), all of which are incorporated by reference).Other analog nucleic acids include those with positive backbones (Denpcyet al., Proc. Natl. Acad. Sci. USA 92:6097 (1995); non-ionic backbones(U.S. Pat. Nos. 5,386,023, 5,637,684, 5,602,240, 5,216,141 and4,469,863; Kiedrowshi et al., Angew. Chem. Intl. Ed. English 30:423(1991); Letsinger et al., J. Am. Chem. Soc. 110:4470 (1988); Letsingeret al., Nucleoside & Nucleotide 13:1597 (1994); Chapters 2 and 3, ASCSymposium Series 580, “Carbohydrate Modifications in AntisenseResearch”, Ed. Y. S. Sanghui and P. Dan Cook; Mesmaeker et al.,Bioorganic & Medicinal Chem. Lett. 4:395 (1994); Jeffs et al., J.Biomolecular NMR 34:17 (1994); Tetrahedron Lett. 37:743 (1996)) andnon-ribose backbones, including those described in U.S. Pat. Nos.5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 580,“Carbohydrate Modifications in Antisense Research”, Ed. Y. S. Sanghuiand P. Dan Cook. Nucleic acids containing one or more carbocyclic sugarsare also included within the definition of nucleic acids (see Jenkins etal., Chem. Soc. Rev. (1995) pp 169-176). Several nucleic acid analogsare described in Rawls, C & E News Jun. 2, 1997 page 35. Modificationsof the ribose-phosphate backbone may be done to facilitate the additionof labels, or to increase the stability and half-life of such moleculesin physiological environments. Nucleic acid analogs can find use in themethods of the invention as well as mixtures of naturally occurringnucleic acids and analogs.

The nucleic acids corresponding to signature genes can be singlestranded or double stranded, as specified, or contain portions of bothdouble stranded or single stranded sequence. The nucleic acid can beDNA, both genomic and cDNA, RNA or a hybrid, where the nucleic acidcontains any combination of deoxyribo- and ribo-nucleotides, and anycombination of bases, including, for example, uracil, adenine, thymine,cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine,isoguanine A nucleic acid sequence corresponding to a signature gene canbe a portion of the gene, a regulatory sequence, genomic DNA, cDNA, RNAincluding mRNA and rRNA, or others.

A nucleic acid sequence corresponding to a signature gene can be derivedfrom the tissue sample, or from a secondary source such as a product ofa reaction such as, for example, a detection sequence from an invasivecleavage reaction, a ligated probe from an OLA or DASL reaction, anextended probe from a PCR reaction, or PCR amplification product,(“amplicon”). Exemplary methods for preparing secondary probes fromtarget sequences are described in US 2003/0108900; US 2003/0170684; US2003/0215821; US 2004/0121364; and US 2005/0181394. Thus, a nucleic acidsequence corresponding to a signature gene can be derived from theprimary or from a secondary source of nucleic acid.

As will be appreciated by those in the art, a complementary nucleic acidsequence useful in the methods of the invention can take many forms andprobes are made to hybridize to nucleic acid sequences to determine thepresence or absence of the signature gene in a sample. In a preferredembodiment, a plurality of nucleic acid sequences is detected. As usedherein, “plurality” or grammatical equivalents herein refers to at least2, 10, 20, 25, 50, 100 or 200 different nucleic sequences, while atleast 500 different nucleic sequences is preferred. More preferred is atleast 1000, with more than 5000 or 10,000 particularly preferred andmore than 50,000 or 100,000 most preferred. Detection can be performedon a variety of platforms such as those set forth above or in theExamples.

The expression level of a signature gene in a tissue sample can bedetermined by contacting nucleic acid molecules derived from the tissuesample with a set of probes under conditions where perfectlycomplementary probes form a hybridization complex with the nucleic acidsequences corresponding to the signature genes, each of the probesincluding at least two universal priming sites and a signature genetarget-specific sequence; amplifying the probes forming thehybridization complexes to produce amplicons; and detecting theamplicons, wherein the detection of the amplicons indicates the presenceof the nucleic acid sequences corresponding to the signature gene in thetissue sample; and determining the expression level of the signaturegene.

In the context of the present invention, multiplexing refers to thedetection, analysis or amplification of a plurality of nucleic acidsequences corresponding to the signature genes. In one embodimentmultiplex refers to the number of nucleic acid sequences correspondingto a signature gene to be analyzed in a single reaction, vessel or step.The multiplexing method is useful for detection of a single nucleic acidsequence corresponding to a signature gene as well as a plurality ofnucleic acid sequences corresponding to a set of signature genes. Inaddition, as described below, the methods of the invention can beperformed simultaneously and in parallel in a large number of tissuesamples.

The expression level of nucleic acid sequences corresponding to a set ofsignature genes in a tissue sample can be determined by contactingnucleic acid molecules derived from the tissue sample with a set ofprobes under conditions where complementary probes form a hybridizationcomplex with the signature gene-specific nucleic acid sequences, each ofthe probes including at least two universal priming sites and asignature gene-specific nucleic acid sequence; amplifying the probesforming the hybridization complexes to produce amplicons; detecting theamplicons, wherein the detection of the amplicons indicates the presenceof the nucleic acid sequences corresponding to the set of signaturegenes in the tissue sample; and determining the expression level of thetarget sequences, wherein the expression of at least two, at leastthree, at least five signature gene-specific sequences is detected.

The presence of one, two or a plurality of nucleic acid sequencescorresponding to a set of signature genes can be determined in a tissuesample using single, double or multiple probe configurations. Themethods of the invention can be practiced with tissue samples havingsubstantially degraded nucleic acids. Although methods forpre-qualifying samples with respect to nucleic acid degradation aredescribed above, those skilled in the art will recognize that otherdetection methods described herein or known in the art can be used todetect RNA levels in a sample suspected of having degraded nucleicacids, thereby determine the level of nucleic acid degradation inaccordance with the invention.

The present invention particularly draws on methodologies outlined in US2003/0215821; US 2004/0018491; US 2003/0036064; US 2003/0211489, each ofwhich is expressly incorporated by reference in their entirety. Inaddition, universal priming methods are described in detail in US2002/0006617; US 2002/0132241, each of which is expressly incorporatedherein by reference. In addition, multiplex methods are described indetail US 2003/0211489; US 2003/0108900, each of which is expresslyincorporated herein by reference. In general, the methods of theinvention can be performed in a variety of ways, as further describedbelow and in the cited applications incorporated by reference. Forexample, mRNA signature samples can initially be subjected to a“complexity reduction” step, whereby the presence of a particular targetis confirmed by adding probes that are enzymatically modified in thepresence of the signature gene-specific nucleic acid sequence. Themodified probes are then amplified and detected in a wide variety ofways. Preferred embodiments draw on multiplexing methods, which allowfor the simultaneous detection of a number of nucleic acid sequences,for example, corresponding to a set of signature genes, as well asmultiplexing amplification reactions, for example by using universalpriming sequences to do multiplex PCR reactions. If desired, the initialstep also can be both a complexity reduction and an amplification step.

“Nucleic acid sequence” or grammatical equivalents herein referred to ascorresponding to a signature gene means the order and type ofnucleotides in a single strand of nucleic acid. The nucleic sequence canbe a portion of a gene, a regulatory sequence, genomic DNA, cDNA, RNAincluding mRNA and rRNA, or others. A preferred embodiment utilizes mRNAas the primary target sequence. As is outlined herein, the nucleic acidsequence can be a sequence from a sample, or a secondary target such as,for example, a product of a reaction such as a detection sequence froman invasive cleavage reaction, a ligated probe from an OLA or DASLreaction, an extended probe from a PCR reaction, or PCR amplificationproduct, (“amplicon”). A nucleic acid sequence corresponding to asignature gene can be any length, with the understanding that longersequences are more specific. Probes are made to hybridize to nucleicacid sequences to determine the presence or absence of expression of asignature gene in a sample.

The invention also provides a collection of isolated probes specific forprostate cancer signature genes consisting essentially of probesspecific for GI_(—)2094528, KIP2, NRG1, NBL1, Prostein, CCNE2, CDC6,FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK,MLCK, PPAP2B and PROK1. The invention also provides a collection ofisolated probes specific for prostate cancer signature genes consistingof probes specific for GI_(—)2094528, KIP2, NRG1, NBL1, Prostein, CCNE2,CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1,CCK, MLCK, PPAP2B and PROK1. The invention also provides a collection ofisolated probes specific for prostate cancer signature genes comprisingprobes specific for GI_(—)2094528, KIP2, NRG1, NBL1, Prostein, CCNE2,CDC6, FBP1, HOXC6, MKI67, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1,CCK, MLCK, PPAP2B and PROK1. Also provided is a collection of isolatedprobes specific for prostate cancer signature genes comprising probesspecific for a subset of the collection consisting of GI_(—)2094528,KIP2, NRG1, NBL1, Prostein, CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, Wnt5A, MEMD, AZGP1, CCK, MLCK, PPAP2B and PROK1.Exemplary subsets include those set forth elsewhere herein.

Thus, the invention also provides a collection of isolated probesspecific for prostate cancer signature genes including any subset of the21 genes set forth in Table 5 including, for example, at least 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 of the21 genes. The invention also provides a collection of isolated probesspecific for prostate cancer signature genes including any subset of theset of genes consisting of MKI67, GI_(—)2094528, HOXC6, CCK, memD, FBP1,and CDC6 including, for example, at least 2, 3, 4, 5 or 6 of the 7genes.

A method of the invention can further include a step of producing areport identifying, for example, a reference model, a set of signaturegenes, a prediction score, a GEX score. The report can include dataobtained from a method of the invention in a format that can besubsequently analyzed to identify a reference model, a set of signaturegenes, a prediction score, a GEX score. Thus, the invention furtherprovides a report of at least one result obtained by a method of theinvention. A report of the invention can be in any of a variety ofrecognizable formats including, for example, an electronic transmission,computer readable memory, an output to a computer graphical userinterface, compact disk, magnetic disk or paper. Other formats suitablefor communication between humans, machines or both can be used for areport of the invention. The methods of the invention can, in part, beconveniently performed on a computer apparatus. Performing one or moresteps of an invention method on a computer apparatus is particularlyuseful when analyzing a large number of parameters such as a largenumber of tissue samples.

In one embodiment, the invention provides a diagnostic method ofassessing whether a patient who has had prostate cancer has a higherthan normal risk for recurrence of the prostate cancer or other cancer,including the steps of comparing the GEX score calculated by a method ofthe invention in a patient sample comparing it to a model for relapseprediction prepared by a method of the invention. A similar GEX score inthe patient sample as compared to the model provides a more accuraterelapse predictor than Gleason score alone.

In one embodiment, the invention provides a diagnostic method ofassessing whether a patient has a higher than normal risk of having ordeveloping a prostate cancer. Small samples, such as those obtainedusing LCM in needle biopsies, with or without tumor glands, can be usedto assess outcomes prior to definitive therapy. This test can giveinformation on diagnosis as well as prognosis through needle biopsysamples. Such a test provides a diagnostic test for routine clinicaluse.

The invention includes compositions, kits, and methods for assessing theprobability of relapse of cancer for an individual from which a sampleis obtained. The sample can be, for example, an archived tissue sampleor a sample obtained from a patient. Where necessary, the compositions,kits, and methods are adapted for use with samples other than patientsamples. For example, when the sample to be used is a parafinized,archived human tissue sample, it can be necessary to adjust the ratio ofcompounds in the compositions of the invention, in the kits of theinvention, or the methods used to assess levels of gene expression inthe sample. Such methods are well known in the art and within the skillof the ordinary artisan. A kit is any manufacture (e.g. a package orcontainer) including at least one reagent, e.g. a probe, forspecifically detecting the expression of a signature gene of theinvention. The kit may be promoted, distributed, or sold as a unit forperforming the methods of the present invention. It is recognized thatthe compositions, kits, and methods of the invention will be ofparticular utility to patients having a history of prostate cancer andtheir medical advisors.

The practice of the present invention employs, unless otherwiseindicated, conventional techniques of molecular biology (includingrecombinant techniques), microbiology, cell biology, and biochemistry,which are within the skill of the art. Such techniques are explained inthe literature, such as, “Molecular Cloning: A Laboratory Manual”,Second edition (Sambrook et al., 1989); “Oligonucleotide Synthesis” (M.J. Gait, ed., 1984); “Animal Cell Culture” (R. I. Freshney, ed., 1987);“Methods in Enzymology” (Academic Press, Inc.); “Handbook ofExperimental Immunology”, Fourth edition (D. M. Weir & C. C. Blackwell,eds., Blackwell Science Inc., 1987); “Gene Transfer Vectors forMammalian Cells” (J. M. Miller & M. P. Calos, eds., 1987); “CurrentProtocols in Molecular Biology” (F. M. Ausubel et al., eds., 1987); and“PCR: The Polymerase Chain Reaction”, (Mullis et al., eds., 1994).

Although the use of the 21 genes, and subsets thereof, has beenexemplified with respect to prognosis and diagnosis methods utilizingexpression levels of mRNA species produced by these genes, it will beunderstood that similar diagnostic and prognostic methods can utilizeother measures such as methylation levels for the genes which can becorrelated with expression levels or a measure of the level oractivities of the protein products of the genes. Methylation can bedetermined using methods known in the art such as those set forth inU.S. Pat. No. 6,200,756 or US 2003/0170684, each of which isincorporated herein by reference. The level and activity of proteins canbe determined using methods known in the art such as antibody detectiontechniques or enzymatic assays particular to the activity beingevaluated. Furthermore, prognosis or diagnosis can be based on thepresence of mutations or polymorphisms identified in the genes thataffect expression of the gene or activity of the protein product.

It is understood that modifications which do not substantially affectthe activity of the various embodiments of this invention are alsoincluded within the definition of the invention provided herein.Accordingly, the following examples are intended to illustrate but notlimit the present invention.

Example I Correlation Between Gene Expression Signature and GleasonScore

This example shows the calculation of a Gene EXpression score (GEX),which is an expression analogy of Gleason score derived from 16 genesand which showed a better correlation (r=0.62) with Gleason score thancorrelation of any individual gene.

A candidate gene approach focused on a set of “informative” genes thatare the most relevant to the subjects of this study was taken as hasbeen described by (Lossos et al., N Engl J Med 350:1828-37 (2004); Paiket al., 2004; Ramaswamy et al., J Clin Oncol 20:1932-41, (2002); van deVijver et al., N Engl J Med 347:1999-2009 (2002)). The genes wereselected based on: (1) Biological relevance. These include tumorsuppressor genes and oncogenes, genes that are indirectly involved incancer development, for example, DNA repair genes; metastasis-inhibitorgenes, genes regulated by various signaling pathways, and/or responsiblefor altered cell growth and differentiation, apoptosis; or genesconsidered to be targets for oncogenic transformation. (2) Publiclyreported lists of genes differentially expressed in prostate cancer asdescribed by (Bettuzzi et al., Cancer Res 63:3469-72 (2003);Dhanasekaran et al., Nature 412:822-6 (2001); Ernst et al. Am J Pathol160:2169-80 (2002); Febbo and Sellers, J Urol 170:S11-9; discussionS19-20 (2003); Glinsky et al., J Clin Invest 113:913-23 (2004); Henshallet al., Cancer Res 63:4196-203 (2003); Lapointe et al., Proc Natl AcadSci USA 101:811-6 (2004); Latil et al., Clin Cancer Res 9:5477-85(2003); Luo et al., Prostate 51:189-200 (2002); Nelson et al., N Engl JMed 349:366-81 (2003); Ramaswamy et al., Nat Genet 33:49-54 (2003);Singh et al., Cancer Cell 1:203-9 (2002); Stamey et al., J Urol166:2171-7, (2001); Stuart et al., Proc Natl Acad Sci USA 101:615-20(2004); van 't Veer et al., Nature 415:530-6 (2002); Welsh et al.,Cancer Res 61:5974-8 (2001). As set forth in Table 3, a list of 512genes was selected from these gene lists based on their overlappingoccurrences among the studies, differential expression levels andbiological relevance. In addition, 20 negative controls were included,designed to target sequences which are not present in the human genomeand used to assess assay specificity.

The RNA samples assayed in this study were extracted from FFPE tissues.200 ng of total RNA isolated from each FFPE tissue block was convertedinto cDNA, and two independent DASL assays were performed for each RNAsample as described by (Fan et al., Genome Res. 14:878-85 (2004)).

Briefly, surgically removed specimens (radical prostatectomy specimens)were processed under routine pathological protocol, and examined by atleast two pathologists. A study number was assigned to the specimen andthe patient identification information (names and hospitalidentification number) was also recorded at the time of specimenretrieval. This data were stored in a Microsoft Excel and Accessdatabase. The specimens usually were received in the pathologylaboratory fresh within 45 minutes of removal. Each specimen was fixedin 10% buffered formalin overnight.

Representative sections were submitted for tissue processing andparaffin embedding. 5-gm thick sections were made for routineHamatoxylin and eosin stains. Specific tissue blocks that included areasof carcinoma were selected for RNA extraction. RNA was procured fromFFPE cancer tissue and nearby non-cancerous tissue. For each pathologictissue, the percentage of tumor content was estimated and used as areference for gene expression pattern analysis. RNA was extracted fromfour to five 5-μm sections using an RNA extraction kit (Roche High PureRNA Paraffin kit), yielding 0.5-3 μg of total RNA.

A total of 71 tissue samples were entered into this study (Table 1).This consists of 29 cases of prostate carcinomas of low risk grouppatients, 26 cases of carcinomas of intermediate risk group, 16 cases ofcarcinomas of high risk group. Low risk group patients had a serum PSA≦10 ng/ml, Gleason summary score ≦6 and a digital rectal examination(DRE) of T1c/T2a. Intermediate risk group patients had a serum PSA 10-20ng/ml, Gleason summary score of 7 and DRE T2b/T2c. The high risk grouppatients had a serum PSA >20 ng/ml, Gleason summary score of 8-10 andDRE T3a/T3b. All tumor blocks contained at least 10% of malignantglands. In addition, 34 matched, non-tumor prostate tissues were used ascontrols. These tissues were various compositions of inflammation,stroma, benign glandular hyperplasia, and glandular atrophy.

Data regarding tissue samples were initially stored in a MicrosoftAccess file detailing the location, specimen number, and pathologicdiagnosis. Approval was obtained from the UCSD IRB (#040487X) to studypatient's existing tissue materials and review of pertinent medicalrecords. The clinical information was de-identified from the originalpatient identification prior to data analysis so the findings will notbe possible to trace back to the original patient.

To identify genes that are prognostically significant, patient historywas provided for clinical correlation. Review of clinical data includeddemographic, clinical and laboratory data that is available either atthe time of specimen collection (cross-sectional data) or becomesavailable at a later time point (longitudinal data). Data obtained fromthe patient treatment file (PTF), outpatient clinic file (OPC), andComputerized Patient Record System (CPRS), including lab, SystematizedNomenclature of Medicine (SNOMED), and tumor registry data. Informationincluded 6-84 months follow up on the study patients and controlsubjects. Information relevant to the patient's diagnosis were obtained(Table 1), which include, but are not limited to, age, ethnicity, serumPSA at the time of surgery, tumor localization, pertinent past medicalhistory related to co-morbidity, other oncological history, familyhistory for cancer, physical exam findings, radiological findings,biopsy date, biopsy result, types of operation performed (radicalretropubic or radical perineal prostatectomy), TNM staging, neoadjuvanttherapy (i.e. chemotherapy, hormones), adjuvant or salvage radiotherapy,hormonal therapy for a rising PSA (biochemical disease relapse), localvs. distant disease recurrence and survival outcome.

TABLE 1 Patient Demographics. Mean Age 68.9 (55-81) Mean PSA 8.1(1.76-24.03) Months Follow Up 43.9 ± 15 (6-84) Numbers Biopsy Risk GroupLow 29 Intermediate 26 High 16 Gleason Grade Distribution 5 2 6 8 7 31 826 9 3 Relapse No 55 Yes 16 Survival Alive 60 Dead 11 AJCC TNM Stage I 0II 53 III 15 IV 3

To assess their integrity, the RNA samples were measured on aBioanalyzer (FIG. 7). In addition, aliquots of the cDNA reactions weretaken for a real-time PCR analysis of a highly expressed housekeepinggene (RPL13A). Highly reproducible gene expression profiles wereobtained for the replicates of each FFPE sample (R²=0.99), even though awide range of RNA degradation was detected in these samples that had upto 8 cycle difference in qPCR (i.e. ˜170-fold difference in “PCR-able”RNA input) for the RPL13A gene (FIG. 6). Under the conditions used, itwas determined that a reasonable expectation of reliable data in theDASL assay was assured if samples did not exhibit a Ct of more than 28cycles. In addition, similar expression profiles were obtained with RNAsextracted independently from separate cuts of the same paraffin tissueblocks (R²=0.93) (data not shown).

The randomly ordered BeadArray™ technology (Michael et al., Anal Chem70, 1242-8 (1998); Walt, Science 287, 451-2 (2000)) has been developedat Illumina as a platform for SNP genotyping (Fan et al., Cold SpringHarb Symp Quant Biol 68:69-78 (2003); Gunderson et al., Nat Genet37:549-54 (2005)), gene expression profiling (Bibikova et al. Am JPathol 165:1799-807 (2004); Fan et al., Genome Res 14:878-85 (2004);Kuhn et al., Genome Res 14:2347-56 (2004); Yeakley et al., NatBiotechnol 20:353-8 (2002)) and DNA methylation detection (Bibikova etal., Genome Res 16:383-93 (2006)). Each array was assembled on anoptical fiber bundle consisting of about 50,000 individual fibers fusedtogether into a hexagonally packed matrix. The ends of the bundle werepolished, and one end was chemically etched to create a microscopic wellin each fiber. These wells were each filled with a 3-micron diametersilica bead. Each derivatized bead had several hundred thousand copiesof a particular oligonucleotide covalently attached and available forhybridization. Bead libraries were prepared by conjugation ofoligonucleotides to silica beads, followed by quantitative poolingtogether of the individual bead types. Because the beads were positionedrandomly on the array, a decoding process was carried out to determinethe location and identity of each bead in every array location(Gunderson et al., Genome Res 14:870-7 (2004)). Each of the 1,624 beadtypes in the resulting universal array was present at an averageredundancy of about 30. Consequently, each assay measurement was theresult of data averaged from multiple beads, which increased precisionand greatly reduced the possibility of error.

To further increase sample throughput, the arrays were formatted into amatrix, in a pattern that matched the wells of standard 96-wellmicrotiter plates. The matrix format allows streamlined sample handling.By bringing the array to the sample (literally dipping it into themicrotiter well), sample and array processing is simplified andintegrated for handling of 96 separate samples simultaneously.

A flexible, sensitive, accurate and cost-effective gene expressionprofiling assay, the DASL (for DNA-mediated annealing, selection,extension and ligation) assay, was used for parallel analysis of over1500 sequence targets (e.g. 500 genes at 3 probes per gene) (Fan et al.,supra 2004). In this assay, two oligos were designed to target aspecific gene sequence. Total RNA was first converted to cDNA by randompriming. The corresponding query oligos hybridized to the cDNA, and wereextended and ligated enzymatically. The ligated products were thenamplified and fluorescently labeled during PCR, and finally detected bybinding to address sequences on the universal array. The hybridizationintensity was used as a measurement of the original mRNA abundance inthe sample.

Unlike most of the other array technologies that use an in vitrotranscription (IVT)-mediated sample labeling procedure (Phillips andEberwine, Methods 10, 283-8 (1996)), DASL uses random priming in thecDNA synthesis, and therefore does not depend on an intact poly-A tailfor T7-oligo-d(T) priming. In addition, the assay utilizes a relativelyshort target sequence of about 50 nucleotides for query oligonucleotideannealing, thus allowing microarray analyses of degraded RNAs (Bibikovaet al., Am J Pathol 165:1799-807 (2004); Bibikova et al., Clin Chem50:2384-6 (2004))

Standard software developed at Illumina was used for automatic imageregistration (Galinsky, Bioinformatics 19:1832-6 (2003)) and extractionof feature intensities. Briefly, the feature extraction algorithmrepresents a weighted 6×6 average of pixel intensities. The outlieralgorithm was implemented at the feature level (each probe sequence wasrepresented by 30 features on average) to remove features that felloutside of a robust confidence interval of the median response.Arraydata was normalized using the “rank invariant” method in Illumina'sBeadStudio software, with sample VA_(—)73 being the reference sample.

In the interest of having the tissue samples mimicking clinicalsituation as much as possible, 71 tumors with various Gleason gradeswere used. Some tumors have uniformly one grade, while others have aprimary and a secondary grade. The tumors with one grade were countedtwice to comprise a Gleason Score (GS; i.e. Gleason Grades 3+3=GleasonScore 6). Gleason Scores of two primary tumor patterns were the sum ofthe two Gleason Grades (i.e. Primary Gleason Grade 4 and secondaryGleason Grade 3 would have a Gleason Score of 7). All 71 tumor samplescontained tumor content higher than 10% and inflammation content lessthan 5%. For each gene, Pearson's correlation coefficient was computedbetween its expression level and Gleason score. P-values were assignedto observed correlations by a permutation test. Sample labels wererandomly permuted 10,000 times and the correlation values weredetermined. For each gene, the p-value is the fraction of randompermutations that resulted in higher correlation value than the one seenwith correct sample labels. A cutoff value of 14/10,000 whichcorresponds to false discovery rate (FDR) adjusted p-value=0.05 was usedand a list of 16 genes was obtained. For all the selected genes, afitted linear model was used (using rlm function with method “MM” inMASS library of the R statistical package) to predict Gleason grades andthe average of 16 independently derived prediction scores was used as agene expression analogy (GEX score) of the Gleason score. Kaplan-Meieranalysis was performed using SurvDiff function from SURVIVAL library ofR package with parameters corresponding to a log-rank test.

Differential gene expression is present within prostate carcinomas ofpatients with various degrees of Gleason grade, thus contributing todifferent clinical outcomes within the groups. The permutation methoddescribed above was used to identify genes that were either positivelyor negatively correlated with Gleason score and generated a panel of 11positively correlated genes: CCNE2, CDC6, FBP1, HOXC6, MKI67, MYBL2,PTTG1, RAMP, UBE2C, Wnt5A, MEMD; and 5 negatively correlated: AZGP1,CCK, MLCK, PPAP2B, and PROK1. The 16 genes can be classified intoseveral groups based on their biological functions: (1) proliferation:MKI67, MYBL2, Wnt5A, PTTG1, AZGP1, and PROK1; (2) cell cycle: CCNE2(cyclin E2), CDC6, MKI67, MYBL2, PTTG1, UBE2C; (3) differentiation:HOXC6, Wnt5A; (4) cell adhesion: MEMD, AZGP1, and MLCK; (5) signaltransduction: Wnt5A, CCK, MLCK, and UBE2C; (6) basic metabolism: FBP1,AZGP1, PPAP2B, and RAMP (a protease).

TABLE 2 Pearson's correlation coefficient (r) between gene expressionand Gleason score, and the p-value calculated from the permutation test.GeneID Correlation Pval AZGP1 −0.35498 2.00E−04 CCK −0.34259 9.00E−04CCNE2 0.364285 9.00E−04 CDC6 0.372737 2.00E−04 FBP1 0.337464 0.0013HOXC6 0.503815 0 MKI67 0.392234 0 MLCK −0.34564 7.00E−04 MYBL2 0.37886 0PPAP2B −0.35176 8.00E−04 PROK1 −0.36189 7.00E−04 PTTG1 0.382119 4.00E−04RAMP 0.448571 0 UBE2C 0.325166 0.0011 Wnt5A 0.394576 4.00E−04 memD0.351226 0.0014

Based on the expression profiles of these 16 genes, a Gene EXpressionscore (GEX, an expression analogy of Gleason score) was calculated. TheGEX score had better correlation (r=0.62) with Gleason score thancorrelation of any individual gene. The GEX score exhibited a nonlinearpattern, in which the expression signature score was flat when GS<7, andthen started rising around GS=7 and GS=9 (FIG. 1), pointing to thepresence of three distinct molecular stages among the prostate cancerpatients [GS5-N, GS6-N, GS6-Y and GS7-N], [GS7-Y, GS8-N and GS8-Y], andGS9-Y, and this may correspond to Gleason pattern grade 3, 4, and 5,respectively. Patients which experienced relapse tended to have higherGEX scores despite having identical Gleason scores (see FIG. 1: GS6-Yvs. GS6-N, GS7-Y vs. GS7-N, and GS8-Y vs. GS8-N).

In order to find out whether the GEX was significantly different intumor versus non-tumor samples, a total of 126 samples of FFPE cancer(N=79) and non-cancer (N=47) prostate tissues were profiled. “Cancer”sections included 10-90% adenocarcinoma in the block. The mean GEX onthe cancer tissues were 7.38+/−0.35 and the GEX on non-cancer prostatetissues were 7.2+/−0.16 (p=0.0013), indicating the GEX scoressignificantly correlated with the diagnostic tissues of cancer versusbenign prostate tissues.

TABLE 3 512 genes selected for this study. Gene-Symbol GenBank ID ABCA5GI_27262623 ABCF3 GI_8922935 ACADSB GI_38373685 ACPP GI_6382063 ADAMTS1GI_11038653 ADD2 GI_9257191 ADPRT GI_11496989 AKAP2 GI_22325354 AKR1C3GI_24497582 ALDH1A2 GI_25777723 ALDH4A1 GI_25777733 ALG-2 GI_22027539ALOX15B GI_4557308 AMACR GI_31541879 AML1 GI_19923197 ANGPT2 GI_4557314ANGPTL2 GI_34577067 ANPEP GI_4502094 ANTXR1 GI_16933552 ANXA2 GI_4757755AP2B1 GI_4557468 APRIN GI_7657268 AQP3 GI_22165421 AR GI_21322251 ARD1GI_34222259 AREG GI_22035683 ARF6 GI_6996000 ARFGAP3 GI_28416437 ARFIP2GI_6912601 ARHGEF7 GI_22027526 ATF2 GI_22538421 ATP2C1 GI_7656909ATP6V1E2 GI_33669104 AZGP1 GI_38372939 BART1 GI_17978472 BAX GI_34335114BBC1 GI_15431296 BBC3 GI_24475588 BC008967 GI_24308353 BC-2 GI_38372936BCATm GI_4502374 BCL2A GI_4557354 BCL2B GI_4557356 BDH GI_34304349 BGNGI_34304351 BHC80 GI_19923461 BIK GI_21536418 BMP5 GI_24797149 BMP7GI_4502426 BMPR1B GI_4502430 BNIP3 GI_7669480 BTF3 GI_29126237 BTG2GI_28872718 C18orf8 GI_21361441 C20orf46 GI_8922926 C6orf56 GI_7662247C7 GI_4557386 CALD1 GI_15149468 CALM1 GI_31377794 CAMKK2 GI_27437014CANX GI_31542290 CAPL GI_9845514 CAPZB GI_4826658 CAV1 GI_15451855 CAV2GI_38176290 CCK GI_4755130 CCNE2 GI_17318566 CD24 GI_7019342 CD38GI_38454325 CD3G GI_339406 CD44 GI_21361192 CDC42BPA GI_30089961 CDC6GI_16357469 CDH1 GI_14589887 CDH11 GI_16306531 CDKN1B GI_17978497 CDKN2AGI_17738299 CDKN2B GI_17981693 cDNA clone GI_10437016 cDNA cloneGI_1178507 cDNA clone GI_1580637 cDNA clone GI_16550429 cDNA cloneGI_2056367 cDNA clone GI_22761402 cDNA clone GI_3043194 cDNA cloneGI_4884218 cDNA clone GI_6504179 cDNA clone GI_674501 cDNA cloneGI_6993120 cDNA clone GI_9120119 cDNA clone GI_9877016 cDNA cloneGI_1307897 cDNA clone GI_1963114 cDNA clone GI_3253738 cDNA cloneGI_1193025 cDNA clone GI_1628918 cDNA clone GI_2094528 cDNA cloneGI_2103530 cDNA clone GI_2805998 cDNA clone GI_3181305 cDNA cloneGI_3253412 cDNA clone GI_3596138 cDNA clone GI_839562 cDNA cloneGI_880122 cDNA clone GI_2325568 cDNA clone GI_1309053 cDNA cloneGI_3360414 CDS2 GI_22035625 CES1 GI_16905523 CETN2 GI_4757901 CHAF1AGI_4885106 CHGA GI_10800418 CKAP4 GI_19920316 CKTSF1B1 GI_37693998 CLDN7GI_34222214 CLU GI_4502904 CLUL1 GI_34222143 c-maf GI_3335147 CNN1GI_34222150 COBLL1 GI_7662427 COL1A1 GI_14719826 COL1A2 GI_21536289COL3A1 GI_15149480 COL4A1 GI_17017989 COL4A2 GI_17986276 COL5A2GI_16554580 COPE GI_31542318 COPEB GI_37655156 CPXM GI_29171731 CRISP3GI_5174674 CRYAB GI_4503056 CSMD1 GI_15100167 CSPG2 GI_21361115 CST3GI_19882253 CTBP1 GI_4557496 CTHRC1 GI_34147546 CTSH GI_23110954 CXCL1GI_4504152 CYP1B1 GI_13325059 DAT1 GI_21361801 DC13 GI_9910183 DCCGI_4885174 DCK GI_4503268 DD3 GI_6165973 DDR1 GI_38327631 DEPC-1GI_21040274 DF GI_4503308 DHCR24 GI_13375617 DHPS GI_7108341 DIO2GI_7549804 DKFZp434C0931 GI_32880207 DKFZP564B167 GI_7661601DKFZp586J0119 GI_26986531 DKFZp586N1423 GI_4729049 DKFZp761D221GI_14150038 DLG2 GI_4557526 DLG3 GI_10863920 DNAH5 GI_19115953 D-PCa-2GI_27734694 D-PCa-2 GI_30314327 D-PCa-2 GI_30314331 drn3 GI_18375529ECT2 GI_21735571 EDNRB GI_4557546 EEF1G GI_25453475 EEF2 GI_25453476EGFR GI_29725608 EGR1 GI_31317226 EIF4EL3 GI_4757701 ELAC2 GI_34147640ERBB2 GI_4758297 ERBB3 GI_4503596 ERG1 GI_33667106 ERG2 GI_7657065 ESM1GI_13259505 EXOC7 GI_24308034 EXT1 GI_4557570 EZH2 GI_23510382 F2RGI_6031164 F5 GI_10518500 FASN GI_21618358 FAT GI_4885228 FBP1GI_16579887 FGF18 GI_4503694 FGF2 GI_15451897 FGF4 GI_4503700 FGFR2GI_13186258 FGR GI_4885234 FLJ12443 GI_33946290 FLJ30473 GI_21389616FLT1 GI_32306519 FOLH1 GI_4758397 FOS GI_6552332 FRZB GI_38455387 FSTL1GI_34304366 FZD7 GI_4503832 G2AN GI_38371757 G6PD GI_21614519 GABRG2GI_4557610 GAGEC1 GI_19747284 GALNT1 GI_13124890 GALNT3 GI_9945386GARNL3 GI_34222344 GAS1 GI_4503918 GDEP GI_24475750 GDF15 GI_4758935GDI2 GI_6598322 GJA1 GI_4755136 GMPS GI_4504034 GNAZ GI_4504050 GNEGI_6382074 GPR126 GI_37620168 GPR43 GI_4885332 GRP GI_34222290 GRPRGI_4885360 GSPT1 GI_4504166 GSPT2 GI_8922423 GSTA1 GI_22091453 GSTM1GI_23065543 GSTM3 GI_23065551 GSTM4 GI_23065554 GSTM5 GI_23065562 GSTP1GI_6552334 GUCY1A3 GI_4504212 hAG-2/R GI_20070225 HDAC9 GI_7662279 HGFGI_33859834 HLA-DPB1 GI_24797075 HLTF GI_21071051 HMG20B GI_5454079HNF-3 alpha GI_24497500 HNMP-1 GI_4503562 HNRPAB GI_14110401 HOXC6GI_24497542 HPN1 GI_33695154 HPN2 GI_4504480 hRVP1 GI_21536298 HSA250839GI_8923753 HSD17B4 GI_4504504 HUEL GI_7656945 ID2 GI_33946335 IER3GI_16554596 IFI27 GI_5031780 IGF1 GI_19923111 IGF2 GI_6453816 IGFBP2GI_10835156 IGFBP3 GI_19923110 IGFBP5 GI_46094066 IL1R1 GI_27894331 ILKGI_4758605 ILKAP GI_29171685 IMPDH2 GI_4504688 INHBA GI_4504698 ITGA1GI_20545279 ITGA5 GI_4504750 ITGB1 GI_19743812 ITGB3 GI_4557676 ITGBL1GI_4758613 ITPR1 GI_10835022 ITPR3 GI_4504794 ITSN GI_3859852 JUNBGI_4504808 KAI1 GI_13259537 KCNRG GI_27734696 KHDRBS3 GI_5730072KIAA0003 GI_21328452 KIAA0152 GI_7661947 KIAA0172 GI_23510374 KIAA0389GI_4826845 KIAA0664 GI_24308018 KIAA0869 GI_29789057 KIAA1109GI_42656961 KIAA1946 GI_29126182 KIAK0002 GI_16950656 KIP GI_9951921KIP2 GI_4557440 KLK2 GI_20149573 KLK3 GI_22208990 KLK4 GI_24234714 KNTC2GI_5174456 K-ras GI_34485723 KRT12 GI_4557698 KRT13 GI_24234693 KRT15GI_24430189 KRT5 GI_17318577 KRT8 GI_4504918 LAMA4 GI_9845494 LAMB1GI_4504950 LAMR1 GI_9845501 LDHA GI_5031856 LIM GI_5453713 LIMK1GI_8051616 LIPH GI_21245105 LMNB1 GI_27436949 LOC119587 GI_39930572LOC129642 GI_20270350 LOC283431 GI_28372562 LOC400665 GI_42661841LOC92689 GI_29789372 LOX GI_21264603 LSAMP GI_4505024 LTB4DH GI_34222094LTBP2 GI_4557732 LTBP4 GI_4505036 LU GI_31543105 LUM GI_21359858 MADH4GI_34147555 MAL GI_12408666 MAP2K1IP1 GI_21614526 MAP3K10 GI_21735549MCCC2 GI_14251210 MCM2 GI_33356546 MCM3 GI_33356548 MCM4 GI_33469918MCM5 GI_23510447 MCM6 GI_33469920 MCM7 GI_33469967 MEIS2 GI_27502374MELK GI_7661973 memD GI_3183974 MET GI_4557746 MGC45594 GI_31342226MIC-1 GI_2674084 MKI67 GI_19923216 MLCK GI_16950610 MLP GI_32401423 MMEGI_6042205 MMP1 GI_13027798 MMP14 GI_13027797 MMP2 GI_11342665 MMP7GI_13027804 MMP9 GI_4826835 MNAT1 GI_4505224 MOAT-B GI_34452699 MPDZGI_4505230 MS4A7 GI_23110999 MSR1 GI_20357509 MT3 GI_5174761 MYBL2GI_31652260 MYC GI_31543215 N2A3 GI_2967518 NBL1 GI_33519445 NDUFA2GI_32171239 NEFH GI_32483415 NELL2 GI_5453765 NETO2 GI_24041025 NGFBGI_4505390 NIPA2 GI_34147393 NKX3-1 GI_19923351 nm23-H2 GI_4505408 NME1GI_38045911 NMU GI_5729946 NOS1 GI_10835172 NOS2A GI_24041028 NOX4GI_20149638 NR4A1 GI_27894342 NRAS GI_6006027 NRG1 GI_4758525 NRIP1GI_4505454 NSP GI_10863934 NTN1 GI_4758839 NTRK3 GI_4505474 NUDT3GI_37622350 NY-REN-41 GI_18087816 ODC1 GI_4505488 OPRS1 GI_22212932ORC6L GI_32454755 OSBPL8 GI_22035617 OXCT GI_4557816 P1 GI_31542946 P4HBGI_20070124 PAICS GI_17388802 PART1 GI_11496986 PCGEM1 GI_11066459 PCNAGI_33239449 PDE3B GI_4505660 PDGFRB GI_15451788 PDLIM7 GI_11496884 PECIGI_5174624 PEX5 GI_37059745 PGM3 GI_7661567 PIM1 GI_31543400 PKCI-1GI_29135342 PLA2G2A GI_20149501 PLA2G7 GI_23512330 PLS3 GI_28416938 PMI1GI_4505234 PPAP2B GI_29171739 PPFIA3 GI_32189361 PPP1CB GI_4506004PPP1R12A GI_4505316 PRC1 GI_4506038 PRKCL2 GI_5453973 PRO1489 GI_7959775PROK1 GI_14165281 Prostein GI_14916436 PRSS8 GI_21536453 PSCAGI_29893565 PSM GI_190663 PSK GI_7706400 PTEN GI_4506248 PTGDRGI_28466968 PTGDS GI_32171248 PTGS2 GI_4506264 PTK9 GI_31543447 PTOV1GI_33695089 PTTG1 GI_11038651 PYCR1 GI_24797096 RAB2 GI_4506364 RAB3BGI_19923749 RAB5A GI_31543538 RAB6B GI_7706674 RAMP GI_7705575 RANGI_6042206 RANGAP1 GI_38201688 rap1GAP GI_4506414 RB1 GI_4506434 RBM5GI_5032030 RBP1 GI_8400726 REPS2 GI_4758943 RET GI_21536316 RFC4GI_31881681 RGS10 GI_11184225 RGS11 GI_4506506 RGS5 GI_4506518 RIGGI_5454007 RNASEL GI_30795246 ROBO1 GI_19743804 RPL13A GI_14591905RPL18A GI_15431299 RPLP0 GI_16933547 RPS2 GI_15055538 RRAS GI_20127497RRN3 GI_21361630 SALL4 GI_37595567 SCUBE2 GI_10190747 SEC14L2 GI_7110714SELENBP1 GI_16306549 SEPP1 GI_4885590 SERPINB5 GI_4505788 SERPINF1GI_34098937 SFN GI_30102938 SGK GI_25168262 SIAT1 GI_27765094 SIAT7DGI_28373089 SIM2 GI_7108363 SLC14A1 GI_7706676 SLC25A6 GI_27764862SLC2A3 GI_5902089 SLC39A6 GI_12751474 SLC43A1 GI_34222288 SLIT3GI_11321570 SND1 GI_7657430 SOCS2 GI_21536304 SOLH GI_5032104 SPARCGI_4507170 SPARCL1 GI_21359870 SPDEF GI_6912579 SPOCK GI_15451924 SQRDLGI_10864010 SRD5A2 GI_4557854 Stac GI_4507246 STEAP GI_22027487 STEAP2GI_25092600 STK39 GI_7019542 STOM GI_38016910 STRA13 GI_21450710 SULF1GI_29789063 SYNE1 GI_41281986 SYT7 GI_38194226 TACSTD1 GI_4505058 TBXA2RGI_27545324 TCF2 GI_6031204 TFAP2C GI_19923162 TFCP2 GI_34147661 TGFAGI_4507460 TGFB1 GI_10863872 TGFB2 GI_4507462 TGFB3 GI_4507464 TGFBR3GI_4507470 TGM4 GI_4507478 THBD GI_4507482 THBS1 GI_4507484 TIMP1GI_4507508 TIMP2 GI_9257247 TMEPAI GI_21361840 TMSNB GI_11496272 TNFRSF6GI_23510419 TNFSF10 GI_23510439 tom1-like GI_4885638 TP53 GI_8400737TP73 GI_4885644 TP73L GI_31543817 TRAF2 GI_22027611 TRAF4 GI_22027621TRAF5 GI_22027625 TRAP1 GI_7706484 TRIM29 GI_17402908 TROAP GI_33438581TRPM8 GI_21361690 TSPAN-1 GI_21264577 TSPYL5 GI_29789280 TU3A GI_6005923TUSC3 GI_30410787 TYMS GI_4507750 UAP1 GI_34147515 UB1 GI_30089964 UBE2CGI_32967292 UBE2L6 GI_38157980 UBE2S GI_7657045 UCHL5 GI_7706752 UNC5CGI_16933524 VCL GI_7669551 WISP1 GI_18490998 Wnt5A GI_17402917 XBP1GI_14110394 XLKD1 GI_5729910 ZABC1 GI_5730123 ZAKI-4 GI_5032234 ZFP36GI_4507960

Example II Gene Expression Profiles Predict Relapse of Prostate Cancer

This example shows the correlation between GEX score for a collection of16 genes and prostate cancer relapse.

As shown in FIG. 2, there was a good correlation between the GEX scoreand relapse and a near-linear increase in percent of relapse cases withGEX score between 7 and 7.6 (FIG. 2). For instance, when GEX score was7.4, approximately 75% of the cases relapsed. When the GEX score reached7.8, 100% of the cases relapsed. It is worth noting that the average GEXscore was 7.2 for GS7 patients without relapse and 7.4 for GS7 patientswho relapsed, corresponding to 20% and 75% of the chance of relapse,respectively.

The Receiver Operating Characteristic (ROC) curve showed that the 16gene expression signature was more predictive of relapse than Gleasonscore (FIG. 3). The GEX score had an AUC (Area Under the Curve) of 0.73,which was better than Gleason score with an AUC 0.65. The Sensitivityand Specificity of relapse prediction were 0.69 and 0.69 for GEXscore >7.337 (corresponding to mean GEX score of all samples withGleason scores 7 and 8), and 0.56 and 0.65 for Gleason score >7.Particularly, the GEX score improved the relapse prediction in patientswith a Gleason score 7 (see FIG. 1), benefiting from the continuousanalogy of Gleason score.

Patients that experienced relapse tended to have higher GEX scoresdespite having identical Gleason scores. The most pronounced differencewas observed in GS7 patients (two-sided t-test p=0.005, when GS7-Ycompared to G57-N, FIG. 1). The GEX scores, when divided among thegroups of GEX>7.3 and <=7.3 (cutoff was chosen as median GEX for GS7 andGS8 samples), had a significant correlation with subsequent relapse inthe Kaplan-Meier analysis (FIG. 4, p=0.007). Among the GS7 patients,1/21 of GS3+4 and 4/11 of GS4+3 relapsed respectively (Fisher Exact testp=0.037). The mean GEX scores were 7.236 and 7.305 for the two groups,respectively (p=0.071 for hypothesis testing increased GEX score for 4+3patients). GS alone was associated with relapse versus no relapse(p=0.02). Neither the tumor stage nor the risk groups assigned at thetime of biopsy significantly correlated with relapse (Kaplan-Meieranalysis, p=0.07 and 0.1, respectively).

Only samples (N=71: 55 without relapse and 16 with relapse) with noresidual tumor after surgery were considered (with the exception ofthree cases, which had non-detectable serum PSA after surgery, but hadpositive margins) and patients which received therapy and had no relapsewere excluded, the lack of relapse could not be due to selectivetherapies but has to be due to the underlying biological differenceamong these patients.

GEX scores of the 34 matched, non-tumor prostate tissues also generateda mean of 7.19±0.18, lower than that in the tumor samples (7.25±0.25),and the GEX on non-tumor tissues appeared to correlate with relapse onKaplan-Meier analysis (p=0.04), although not as significant as the tumoritself.

Four cases of FFPE prostate cancer tissue sections were de-paraffinizedin xylene and re-hydrated in ethanol. Antigen retrieval was performed bysteam heating with 1×DAKO Target Retrieval solution. The sections werethen allowed to cool to room temperature in the solution. The endogenousperoxidase was removed by 3% H₂O₂. Non-specific binding of biotin andavidin was blocked by blocking solution for 30 minutes (Protein BlockSerum-Free, DAKO, Carpinteria, Calif.). The background staining wasreduced with incubation of goat serum (1:20 dilution) for 60 minutes.Primary antibodies (HOXC6 1:100, Aviva Systems Biology, and Ki-67 1:200,DAKO) were placed on slides and incubated for 1 hour at room temperaturein the case of Ki-67 and overnight at room temperature in the case ofHOXC6. Secondary antibodies conjugated with Streptavidin/HRP (LSAB2,DAKO) were used. The slides were washed and antibody complex visualizedby 3,3′-diaminobenzidine (DAB, DAKO). The nuclei were counterstained byGill's II Hamatoxylin. Immunoactivity in the tissues was estimated bycounting the number of positive cells per 1,000 tumor cells. Cases wereconsidered positive if more than 20% of the tumor cells were staining

As shown in FIG. 5, immunohistochemistry of HOXC6 showed distinct stainon the tumor nuclei. Compared to adjacent benign prostatic glands, thestain appeared to be restricted to tumor glands, although some benignglands also seemed to have weak signals. The findings supported the geneexpression results described above. Ki67 also showed nuclear stain inthe tumor cells; however, quantitatively varied between glands.

The gene expression score (GEX) derived from the expression levels ofthe 16 genes was used to predict relapse of prostate cancer. It is worthpointing out that there was no “training/fitting” made toward theirprognostic power at the gene selection step. Making continuous analogyof Gleason grade increased molecular resolution, especially at GS=7-8,in which patients can be stratified better based on their geneexpression profiles (FIG. 1); in turn, this translates to a goodpredictor of relapse for prostate cancer (FIG. 4).

Interestingly, the GEX score exhibited a nonlinear pattern, in which theexpression signature score stayed flat when GS<7, and started rising atGS=7 and plateaued at GS=9 (FIG. 1). This suggests that there may bethree distinct molecular stages among the prostate cancer patients, andthis may have corresponded to Gleason scores 6, 7, and 8, respectively.GEX profiles can potentially identify a subset of histologicallyintermediate-grade tumors that have more aggressive clinical behavior,i.e. to separate out GS7 patients who were more likely to relapse.

GEX scores were also calculated in the “matched” non-tumor prostatetissues in the same population. The GEX scores were lower than thoseseen in the tumor glands but the scores had statistically significantcorrelation with disease relapse (p=0.04). However, the statisticalsignificance was much less than the GEX in the tumor glands (p=0.007).It is possible that the most robust signals generated by GEX came fromthe tumor glands. However, it is highly plausible that the stromalsignals also contributed to the overall GEX score. Additional clinicaltrials can be performed to test whether non-tumor stroma generatesignals that can predict relapse. For example small samples, such asthose obtained using LCM in needle biopsies, with or without tumorglands, can be used to assess prognostic outcomes prior to definitivetherapy.

The sensitivity and specificity of the 16 gene markers can be validatedin another independent cohort and prospectively in clinical trials wherepatients undergo prostate needle biopsy for diagnosis of carcinoma. Thistest can give information on diagnosis as well as prognosis throughneedle biopsy samples. Such a test provides, a mature diagnostic testthat is technically simple and applicable for routine clinical use, andcan eventually be incorporated into existing prostate cancer nomograms

In sum, the above examples show identification of signature genes andsubsequent compilation of a set of signature genes capable ofestablishing a molecular signature that can be used to predict relapsein individuals treated for prostate cancer. The molecular signaturedescribed herein replicates across independent sample sets andrepresents a better expression signature with higher specificity andsensitivity than previously available. As described herein, theprinciples underlying these examples can be applied to other cancertypes as well as other conditions.

Example III An Expanded Collection of Signature Genes that PredictRelapse of Prostate Cancer

This example shows the correlation between GEX score for a collection of21 genes and prostate cancer relapse.

The correlation between GEX score and Gleason score was re-evaluated forthe set of 512 genes shown in Table 3 as follows. GEX score wasdetermined for each of the 512 genes in each of the 71 samples asdescribed in Example I. All genes that had a correlation outside of themedian ±2 standard deviations were identified. This resulted in theaddition of the GI_(—)2094528, KIP2, NRG1, NBL1, and Prostein genes tothe list of 16 genes described in Examples I and II. The resultingcollection of 21 genes shown in Table 4.

TABLE 4 Collection of 21 signature genes GeneID Correlation CCNE20.36426 CDC6 0.37274 FBP1 0.33746 HOXC6 0.50382 MKI67 0.39223 MYBL20.37886 PTTG1 0.38212 RAMP 0.44857 UBE2C 0.32517 Wnt5A 0.39458 memD0.35123 GI_2094528 0.30986 KIP2 −0.31881 NRG1 −0.27405 NBL1 −0.32061Prostein −0.26023 AZGP1 −0.35498 CCK −0.34259 MLCK −0.34564 PPAP2B−0.35176 PROK1 −0.36189

The list of 21 genes in Table 4 was evaluated to rank the genesaccording to the probability that each individual gene was included in aset of genes that was correlated with Gleason score. The ranked list of21 genes is shown in Table 5. The rank was determined as follows. Allcombinations of the 21 genes were placed in a Boolean matrix having 21columns (1 for each gene) and 2,097,151 lines (number of combinations of1 to 21 into 21). The GEX scores were calculated for each combinationusing this matrix and the results of the fitted linear models (using rlmfunction as described in Example I). A Kaplan-Meier analysis was run foreach combination and the resulting p-values were stored. Thedistribution of p-values was plotted for each of the 21 genes and the3^(rd) quartile p-value was determined for each gene. The gene with thehighest 3^(rd) quartile p-value was removed (i.e. Prostein which had a3^(rd) quartile p-value of 0.053866667 as shown in Table 5). The processwas repeated for a Boolean matrix of the 20 remaining genes (i.e. allgenes in Table 5 except Prostein). In the second repetition NBL1 wasidentified and the process repeated again for a matrix of 19 genesincluding those listed in Table 5 with the exception of Prostein andNBL1. The process was repeated until all genes had been ranked such thatthe first gene to be removed was ranked last and the final remaininggene was ranked first.

TABLE 5 Ranking of 21 genes 1^(st) quartile median 3^(rd) quartile RankGene p-value p-value p-value 1 MKI67 0.005149 0.005149 0.005149 2GI_2094528 0.000725185 0.001363455 0.0020022 3 HOXC6 0.0002294130.0009615 0.002487667 4 CCK 0.000301025 0.000641625 0.00263825 5 memD9.14E−05 0.00033542 0.00104746 6 FBP1 9.19E−05 0.000396633 0.001919933 7CDC6 5.16E−05 0.000166507 0.0008572 8 PROK1 5.60E−05 0.0001890930.00107155 9 MYBL2 6.40E−05 0.000227222 0.001027144 10 UBE2C 6.93E−050.00026413 0.00109623 11 PTTG1 0.000119859 0.000440473 0.001409827 12KIP2 0.000137185 0.000467458 0.001629417 13 NRG1 0.000170895 0.0005944770.002144154 14 AZGP1 0.000262736 0.000879486 0.0029995 15 MLCK0.000320207 0.00114972 0.004122933 16 Wnt5A 0.000291781 0.0011493810.004899125 17 PPAP2B 0.000332412 0.001535388 0.007228882 18 CCNE20.000598356 0.002471444 0.0099375 19 RAMP 0.001336737 0.0050294210.017818263 20 NBL1 0.002041885 0.0080705 0.030194 21 Prostein0.003319571 0.01380119 0.053866667

As shown in Table 5 the lowest p-values occur for a combination of 7genes including MKI67, GI_(—)2094528, HOXC6, CCK, memD, FBP1, and CDC6.Thus, the gene expression pattern for this combination of genes providesa particularly useful GEX score for predicting relapse in prostatecancer patients. FIG. 9 shows box plots for the distribution of survivalp-values determined from GEX scores for the individual genes and forvarious combinations from two to seven of the genes. The plots indicatethat the GEX score determined from each individual gene is a betterpredictor of prostate cancer relapse than Gleason score for all but onegene (i.e. CDC6 had a p-value near 0.152 compared to a p-value of 0.138for Gleason score). Thus, the GEX scores derived individually fromMKI67, GI_(—)2094528, HOXC6, CCK, memD, or FBP1 provide a betterpredictor of prostate cancer relapse than Gleason score. Furthermore,all combinations of two or more of the 7 genes provided a betterpredictor of prostate cancer relapse than Gleason score. This was thecase whether or not CDC6 was a member of the combination. Accordingly,the GEX score for CDC6 is reasonably predictive of prostate cancerrelapse.

FIG. 10 shows the Kaplan-Meier plot for a combination of 7 genesincluding MKI67, GI_(—)2094528, HOXC6, CCK, memD, FBP1, and CDC6. Ap-value of 6.45×10⁻⁵ was found for the GEX score determined for the 7genes. The plot compares very well to the Kaplan-Meier plot for 16 genes(see FIG. 4).

The results above demonstrate that a collection of 21 genes is usefulfor determining a GEX score that is predictive of prostate cancerrelapse. The results also demonstrate methods for identifyingsub-combinations of the 21 genes that are predictive of prostate cancerrelapse.

Throughout this application various publications have been referencedwithin parentheses. The disclosures of these publications in theirentireties are hereby incorporated by reference in this application inorder to more fully describe the state of the art to which thisinvention pertains.

Although the invention has been described with reference to thedisclosed embodiments, those skilled in the art will readily appreciatethat the specific examples and studies detailed above are onlyillustrative of the invention. It should be understood that variousmodifications can be made without departing from the spirit of theinvention. Accordingly, the invention is limited only by the followingclaims.

1-21. (canceled)
 22. A method for predicting the probability of relapseof prostate cancer in an individual, said method comprising the steps of(a) providing a prostate tissue sample for a test individual, (b)providing expression levels for a collection of signature genes fromsaid sample, wherein said collection of signature genes comprises CCKand at least one of GI_(—)2094528, MKI67, MEMD, FBP1, and CDC6, (c)deriving a score that captures said expression levels for saidcollection of signature genes, (d) providing a reference modelcomprising information correlating said score with prostate cancerrelapse, and (e) comparing said score to said reference model therebydetermining the probability of prostate cancer relapse for saidindividual.
 23. The method of claim 22, wherein said individual has aGleason score between seven and eight.
 24. The method of claim 23,wherein said Gleason score is based on a primary grade of four.
 25. Themethod of claim 23, wherein said Gleason score is based on a primarygrade of three.
 26. The method of claim 22, further comprising providinga report having a prediction of prostate cancer relapse for saidindividual.
 27. The method of claim 22, wherein said collection ofsignature genes further comprises HOXC6, KIP2, NRG1, NBL1, Prostein,CCNE2, MYBL2, PTTG1, RAMP, UBE2C, Wnt5A, AZGP1, MLCK, PPAP2B, and PROK1.28. The method of claim 22, wherein said collection of signature genesfurther comprises at least one of HOXC6, CCNE2, MYBL2, PTTG1, RAMP,UBEC2C, Wnt5A, AZGP1, MLCK, PPAP2B, and PROK1.
 29. The method of claim22, wherein said collection of signature genes further comprises atleast two of HOXC6, CCNE2, MYBL2, PTTG1, RAMP, UBEC2C, Wnt5A, AZGP1,MLCK, PPAP2B, and PROK1.
 30. The method of claim 22, wherein said scoreis a better predictor of prostate cancer relapse than a Gleason score.31. The method of claim 22, wherein said collection further comprises atleast one gene selected from the group consisting of HOXC6, CCNE2,MYBL2, PTTG1, RAMP, UBEC2C, Wnt5A, AZGP1, MLCK, PPAP2B, and PROK1.